54
Vol.108 (2) June 2017 SOUTH AFRICAN INSTITUTE OF ELECTRICAL ENGINEERS 41 June 2017 Volume 108 No. 2 www.saiee.org.za Africa Research Journal ISSN 1991-1696 Research Journal of the South African Institute of Electrical Engineers Incorporating the SAIEE Transactions

V108 2 2017 S IN INSI I NINS 41 ISSN 1991-1696 Africa ... · PDF fileI attended the presentation of each of these papers and based on the reviewer ... We also show that simple information

  • Upload
    vodiep

  • View
    216

  • Download
    2

Embed Size (px)

Citation preview

Vol.108 (2) June 2017 SOUTH AFRICAN INSTITUTE OF ELECTRICAL ENGINEERS 41

June 2017Volume 108 No. 2www.saiee.org.za

Africa Research JournalISSN 1991-1696

Research Journal of the South African Institute of Electrical EngineersIncorporating the SAIEE Transactions

Vol.108 (2) June 2017SOUTH AFRICAN INSTITUTE OF ELECTRICAL ENGINEERS42

(SAIEE FOUNDED JUNE 1909 INCORPORATED DECEMBER 1909)AN OFFICIAL JOURNAL OF THE INSTITUTE

ISSN 1991-1696

Secretary and Head OfficeMrs Gerda GeyerSouth African Institute for Electrical Engineers (SAIEE)PO Box 751253, Gardenview, 2047, South AfricaTel: (27-11) 487-3003Fax: (27-11) 487-3002E-mail: [email protected]

SAIEE AFRICA RESEARCH JOURNAL

Additional reviewers are approached as necessary ARTICLES SUBMITTED TO THE SAIEE AFRICA RESEARCH JOURNAL ARE FULLY PEER REVIEWED

PRIOR TO ACCEPTANCE FOR PUBLICATIONThe following organisations have listed SAIEE Africa Research Journal for abstraction purposes:

INSPEC (The Institution of Electrical Engineers, London); ‘The Engineering Index’ (Engineering Information Inc.)Unless otherwise stated on the first page of a published paper, copyright in all materials appearing in this publication vests in the SAIEE. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, magnetic tape, mechanical photo copying, recording or otherwise without permission in writing from the SAIEE. Notwithstanding the foregoing, permission is not required to make abstracts oncondition that a full reference to the source is shown. Single copies of any material in which the Institute holds copyright may be made for research or private

use purposes without reference to the SAIEE.

EDITORS AND REVIEWERSEDITOR-IN-CHIEFProf. B.M. Lacquet, Faculty of Engineering and the Built Environment, University of the Witwatersrand, Johannesburg, SA, [email protected]

MANAGING EDITORProf. S. Sinha, Faculty of Engineering and the Built Environment, University of Johannesburg, SA, [email protected]

SPECIALIST EDITORSCommunications and Signal Processing:Prof. L.P. Linde, Dept. of Electrical, Electronic & Computer Engineering, University of Pretoria, SA Prof. S. Maharaj, Dept. of Electrical, Electronic & Computer Engineering, University of Pretoria, SADr O. Holland, Centre for Telecommunications Research, London, UKProf. F. Takawira, School of Electrical and Information Engineering, University of the Witwatersrand, Johannesburg, SAProf. A.J. Han Vinck, University of Duisburg-Essen, GermanyDr E. Golovins, DCLF Laboratory, National Metrology Institute of South Africa (NMISA), Pretoria, SAComputer, Information Systems and Software Engineering:Dr M. Weststrate, Newco Holdings, Pretoria, SAProf. A. van der Merwe, Department of Infomatics, University of Pretoria, SA Dr C. van der Walt, Modelling and Digital Science, Council for Scientific and Industrial Research, Pretoria, SAProf. B. Dwolatzky, Joburg Centre for Software Engineering, University of the Witwatersrand, Johannesburg, SAControl and Automation:Prof K. Uren, School of Electrical, Electronic and Computer Engineering, North-West University, S.ADr J.T. Valliarampath, freelancer, S.ADr B. Yuksel, Advanced Technology R&D Centre, Mitsubishi Electric Corporation, JapanProf. T. van Niekerk, Dept. of Mechatronics,Nelson Mandela Metropolitan University, Port Elizabeth, SAElectromagnetics and Antennas:Prof. J.H. Cloete, Dept. of Electrical and Electronic Engineering, Stellenbosch University, SA Prof. T.J.O. Afullo, School of Electrical, Electronic and Computer Engineering, University of KwaZulu-Natal, Durban, SA Prof. R. Geschke, Dept. of Electrical and Electronic Engineering, University of Cape Town, SADr B. Jokanović, Institute of Physics, Belgrade, SerbiaElectron Devices and Circuits:Dr M. Božanić, Azoteq (Pty) Ltd, Pretoria, SAProf. M. du Plessis, Dept. of Electrical, Electronic & Computer Engineering, University of Pretoria, SADr D. Foty, Gilgamesh Associates, LLC, Vermont, USAEnergy and Power Systems:Prof. M. Delimar, Faculty of Electrical Engineering and Computing, University of Zagreb, Croatia Engineering and Technology Management:Prof. J-H. Pretorius, Faculty of Engineering and the Built Environment, University of Johannesburg, SA

Prof. L. Pretorius, Dept. of Engineering and Technology Management, University of Pretoria, SAEngineering in Medicine and BiologyProf. J.J. Hanekom, Dept. of Electrical, Electronic & Computer Engineering, University of Pretoria, SA Prof. F. Rattay, Vienna University of Technology, AustriaProf. B. Bonham, University of California, San Francisco, USA

General Topics / Editors-at-large: Dr P.J. Cilliers, Hermanus Magnetic Observatory, Hermanus, SA Prof. M.A. van Wyk, School of Electrical and Information Engineering, University of the Witwatersrand, Johannesburg, SA

INTERNATIONAL PANEL OF REVIEWERSW. Boeck, Technical University of Munich, GermanyW.A. Brading, New ZealandProf. G. De Jager, Dept. of Electrical Engineering, University of Cape Town, SAProf. B. Downing, Dept. of Electrical Engineering, University of Cape Town, SADr W. Drury, Control Techniques Ltd, UKP.D. Evans, Dept. of Electrical, Electronic & Computer Engineering, The University of Birmingham, UKProf. J.A. Ferreira, Electrical Power Processing Unit, Delft University of Technology, The NetherlandsO. Flower, University of Warwick, UKProf. H.L. Hartnagel, Dept. of Electrical Engineering and Information Technology, Technical University of Darmstadt, GermanyC.F. Landy, Engineering Systems Inc., USAD.A. Marshall, ALSTOM T&D, FranceDr M.D. McCulloch, Dept. of Engineering Science, Oxford, UKProf. D.A. McNamara, University of Ottawa, CanadaM. Milner, Hugh MacMillan Rehabilitation Centre, CanadaProf. A. Petroianu, Dept. of Electrical Engineering, University of Cape Town, SAProf. K.F. Poole, Holcombe Dept. of Electrical and Computer Engineering, Clemson University, USAProf. J.P. Reynders, Dept. of Electrical & Information Engineering, University of the Witwatersrand, Johannesburg, SAI.S. Shaw, University of Johannesburg, SAH.W. van der Broeck, Phillips Forschungslabor Aachen, GermanyProf. P.W. van der Walt, Stellenbosch University, SAProf. J.D. van Wyk, Dept. of Electrical and Computer Engineering, Virginia Tech, USAR.T. Waters, UKT.J. Williams, Purdue University, USA

Published bySouth African Institute of Electrical Engineers (Pty) Ltd, PO Box 751253, Gardenview, 2047 Tel. (27-11) 487-3003, Fax. (27-11) 487-3002, E-mail: [email protected]

President: Mr J MachinjikeDeputy President: Dr H. Heldenhuys

Senior Vice President: Mr G Debbo

Junior Vice President:Mrs S Gourrah

Immediate Past President: Mr TC Madikane

Honorary Vice President:Dr B Kotze

Vol.108 (2) June 2017 SOUTH AFRICAN INSTITUTE OF ELECTRICAL ENGINEERS 43

VOL 108 No 2June 2017

SAIEE Africa Research Journal

Using Automated Keyword Extraction To Facilitate Team Discovery In A Digital Forensic Investigation Of Electronic Communications........................ .................................................. 45W.J.C. Van Staden and E. Van Der Poel

Personal Information And Regulatory Requirements For Direct Marketing: A South African Insurance Industry Experiment .... .............................................................. . 56A. Da Veiga and P. Swartz

Specific Emitter Identification For Enhanced Access Control Security ........ ..................................................... 71J.N. Samuel and W.P. Du Plessis

Manet Reactive Routing Protocols Node Mobility Variation Effect In Analysing The Impact Of Black Hole Attack .......... ............................................................ 80E.O. Ochola, L.F. Mejaele, M.M. Eloff and J.A. Van Der Poll

SAIEE AFRICA RESEARCH JOURNAL EDITORIAL STAFF ...................... IFC

Vol.108 (2) June 2017SOUTH AFRICAN INSTITUTE OF ELECTRICAL ENGINEERS44

GUEST EDITORIAL

INFORMATION SECURITY SOUTH AFRICA (ISSA) 2016

This special issue of the SAIEE Africa Research Journal is devoted to selected papers from the Information Security South Africa (ISSA) 2016 Conference which was held in Johannesburg, South Africa from 17 to 18 August 2016. The aim of the annual ISSA Conference is to afford information security practitioners and researchers, from all over the globe, an opportunity to share their knowledge and research results with their peers. The conference in 2016 focused on a wide spectrum of aspects in the information security domain, including the functional, business, managerial, human, theoretical and technological aspects of modern-day information security.

With the assistance of the original reviewers, ten conference papers that had received good overall reviews were identified. I attended the presentation of each of these papers and based on the reviewer reports and the presentations, seven of these papers were selected for possible publication in this special issue. The authors of these seven selected papers were asked to rework their papers by expanding and/or further formalising the research conducted. Six of these papers were submitted and subsequently reviewed again by a minimum of three reputable international subject specialists. These reviews were received to make a confident decision as to the inclusion of these papers in the special issue.

After the review process was completed, including attending to the reviewers’ suggestions, only fourpapers were selected to be published in this special issue. These four papers cover various aspects of information security which include: team discovery in a digital forensic investigation of electronic communications; processing of personal information in line with regulatory requirements for direct marketing; specific emitter identification for enhanced access control, and the impact of the black hole attack on MANET reactive routing protocols. Therefore, this special issue includes four rather diverse papers in the discipline of information security, providing a true reflection of the multidisciplinary nature of this field of study.

I would like to thank the expert reviewers who diligently reviewed these papers. These reviews certainly contributed to the quality of this special issue.

To conclude, I would like to express my appreciation to IEEE Xplore who originally published the ISSA Conference papers, and for granting permission for these reworked papers to be published in this special issue.

Prof. Stephen V. FlowerdayGuest Editor

Vol.108 (2) June 2017 SOUTH AFRICAN INSTITUTE OF ELECTRICAL ENGINEERS 45

USING AUTOMATED KEYWORD EXTRACTION TO FACILITATETEAM DISCOVERY IN A DIGITAL FORENSIC INVESTIGATIONOF ELECTRONIC COMMUNICATIONS

W.J.C. van Staden∗ and E. van der Poel†

∗ School of Computing, UNISA Science Campus, Florida Park, South Africa. E-mail: [email protected]† School of Computing, UNISA Science Campus, Florida Park, South Africa. E-Mail:[email protected]

Abstract: A major problem that often occurs in Digital Forensics (DF) is the huge volumes of data thathas to be searched, filtered, and indexed to discover patterns that could lead to forensic evidence. Thenature of, and the process by which the data gets collected, implies that the data also contain informationabout persons that are not implicated, or only incidentally involved in the crime under investigation.Privacy is therefore an important issue that needs to be managed in a DF investigation. This papershows that techniques used in the Team Formation (TF) task can be successfully applied to addressboth the problems of data volume and privacy. The TF task can be re-formulated to fit the DF arena:to commit a crime, the culprit(s) may require the assistance of several other individuals, which impliesthat a team of some sort gets established. During a post-mortem DF analysis, an investigator may onlyhave one, or a few names to start with. One of the key challenges is finding possible co-conspirators.From a TF point of view, the culprit is trying to find the best team to commit the crime, given someconstraints. The TF task in DF requires the recording of skill-sets, and the generation and/or discoveryof a graph depicting interaction between candidates. If the data consist of an email corpus and peoples’roles in an organisation (such as in the Enron data), both of these are readily available. In this paperwe consider the TF problem in general and extend it to the DF arena by considering the informationthat an investigator may have access to during the investigation. We also show that simple informationretrieval and keyword extraction techniques (such as RAKE) can be used to automatically discoverpotential teams from the data, while preserving privacy; results from a series of experiments (usingthe new definitions of TF and the proposed information retrieval techniques) on the Enron data is thenpresented.

Key words: Digital Forensics, Digital Forensic Investigation, Cyber-crime, Team-formation, SocialNetwork Analysis, Expert Finding

1. INTRODUCTION

The post-mortem forensics analysis of communicationsdata, such as an email corpus can be an extremely

difficult and time-consuming task due to the volume

and weakly structured nature of the data. The analysisprocess usually involves a traditional brute-force search

for specific patterns, filtering to reduce the search space,and indexing of the resulting documents, or parts of

documents. The patterns, filters and indexing mechanisms

are often hand-crafted by the investigator, usually specificto the potential crime being investigated. That is, the

use of ’hunches’ provide the initial stepping stones for an

investigative effort during the early stages.

Proposed techniques use machine learning [1] anddata-mining techniques [2] to guide the investigator’s

efforts by highlighting ‘low-hanging fruit’. These

techniques and tools save time and allow the investigatorto more quickly find results that could lead to evidence.

Another idea would be to explore the data to find possible

teams within the forensic data.

The creation and formation of teams have been studiedin operations research and the management and social

sciences. In operations research the Team Formation(TF) problem consists of optimally assigning people with

certain skills to a task to be accomplished, for example

building a software development team. In the social

sciences the TF techniques often are used to do a post-hocdiscovery of teams, by using individuals’ communication

patterns. Graphs are constructed from these patterns

showing communication habits and patterns – but the focusis not necessarily on the ability of such persons to form a

team around a particular set of tasks.

Crimes often involve the creation of teams, where a teamwould not be as rigid and designed as in the case of a

software development team. Such a team is likely to be

sub-optimal from a skills perspective, as there would bethe additional constraint that the potential team member

would have to be willing, or be able to be coerced to

commit acts that would assists in the crime. There mayeven be unwitting team members, who participates in

the crime through the simple act of doing their jobs.The TF task in the planning and execution of a crime

therefore has possible additional dimensions. This also

implies that a team may not necessarily all be aware of thecrime being committed – thus the construction of a team

could potentially include members who are simply used as

’tools’ in order to commit the crime.

This paper shows that techniques used in TF discovery

Based on: “Team Formation in Digital Forensics”, by W.J.C. van Staden and E. van der Poel which appeared in the Proceedings of Information Security South African (ISSA) 2016, Johannesburg, 17 & 18 August 2016. © 2016 IEEE

Vol.108 (2) June 2017SOUTH AFRICAN INSTITUTE OF ELECTRICAL ENGINEERS46

can be applied to the Digital Forensics (DF) task to

automatically discover potential teams involved in the

crime. This means that the investigator has a much smallerset of potential culprits to start investigating, than using

more traditional investigation techniques. It also has the

benefit that the investigator does not need to look at thedata of potentially innocent persons whose data happens to

form part of the corpus. This has positive implications for

privacy.

The TF problem is therefore considered from the

perspective of the culprit(s); if someone wanted to commit

a crime, who would the best team be to accomplish this?The word ‘team’ should be considered a loose term, as

the team may involved people who are simply doing their

normal jobs, or may involve people, who has informationrequired to accomplish aspects of the crime, and may or

may not know that they are providing the information to

aid in the commission of a crime.

Applying TF techniques can be viewed as intelligentautomated filters that aim to (hopefully substantially)

reduce the list of potential suspects. As in any

investigation, these persons should remain ‘just’ suspectsuntil further corroborating evidence is found.

To illustrate the concepts of applying TF discovery in

DF, the Enron email corpus∗ was used as the data underinvestigation. Since the Enron data-set has undergone

several releases in which data has been removed (at the

request of persons whose data was within the data-set) thedata provided can no longer be used to identify those who

were indicted, implicated, or sentenced – hence, for the

moment, we cannot provide error rates or accuracy (recalland precision), however, it is important to understand

that the purpose of the proposed techniques is not to

provide an automated system for solving cyber-crime – thepurpose is to provide tools and techniques that can guide

an investigator through the investigation, and importantly,potentially protect the privacy of parties that may not be

involved in the crime.

1.1 Contribution

This paper contributes to the field of Digital Forensics (DF)by applying techniques of the Team Formation (TF) task

from a digital forensic perspective. It is argued that the TF

task can be applied during a post-mortem analysis of seizeddata to guide the investigator, by narrowing down the list

of suspects, focusing on persons of immediate interest, and

avoiding investigating potentially innocent persons. Tofacilitate the use of TF, however, the team formation task

has to be placed in the correct context.

In general, TF considers social network graphs andpotential team members’ skills and expertise to build a

team to complete a specific task. The important difference

between this work and others is that the team formationproblem is framed in the DF paradigm, specifically with

the focus on guiding the investigator during the analysis.

∗The Enron corpus was downloaded from http://tinyurl.com/myjmcjl

It is shown that standard Information Retrieval (IR)

techniques can be employed to extract information from

an email corpus, that can lead to identifying teams. Theformulation of the TF task in the DF paradigm will allow

further research into automation of the guidance provided

to the investigator. A formal notation for the TF task is alsoproposed. This notation can be used when reasoning about

the team formation problem in this and future research.

We further show that by using automated keyword

extraction (also a technique from the IR field) that TF canbe further aided by identifying potentially telling keywords

and phrases that identify persons within the corpus.

Additionally, by allowing the investigator to focus

specifically on persons of interest (i.e those in the team),

the privacy of others whose data forms part of the seizeddata may be protected.

1.2 Structure of the paper

The rest of the paper is structured as follows:

• Section 2. provides background information on DF,

the TF task and related work.

• Section 4. frames the TF task in the DF paradigm, and

provides formal definitions for ranking individuals.

• Section 5. provides some examples of the application

of the ideas presented in the paper to the Enron mailcorpus.

• Section 5.2 discusses the use of automated keyword

extraction techniques in order to explore the Enron

mail corpus.

• Section 6. presents a discussion on the techniques

applied in this paper.

• Finally, section 7. provides concluding remarks.

2. BACKGROUND AND RELATED WORK

Reformulation of the team formation problem concerns

itself with two important pieces of work. Firstly, DF

provides the paradigm within which the problem iscontextualised, secondly, the team formation problem

provides the concepts and tools needed to reformulate and

understand the problem. Each of these is discussed in turnin the following sections. In order to illustrate the potential

use of the work, the use of IR and keyword extraction

techniques is also presented.

2.1 Digital Forensics

Digital Forensics (DF) is defined as the “...preservation,

collection, validation, identification, analysis, interpreta-tion, documentation and presentation of evidence in a

digital context [3].” Using sound forensic techniquesand proper controls digital data that could potentially be

Vol.108 (2) June 2017 SOUTH AFRICAN INSTITUTE OF ELECTRICAL ENGINEERS 47

evidence is gathered, analysed and presented in context as

part of the cyber-crime investigation. Politt [4] calls this

the creation of a narrative.

This paper is concerned with digital evidence in the

form of data. In particular, the post-mortem analysis (asopposed to live analysis) of de-obfuscated data. Since

data can be hidden, a lot of DF research goes into the

finding and identification of data. These techniques involvefile-carving to find deleted data [5, 6], similarity hashes to

identify files or parts of files [7,8], to name but two∗∗. Once

data has been de-obfuscated, that is, their meaning can bereadily inferred, an analysis on the content can be done

which will contribute to the narrative.

The analysis of the data can also be seen as a

de-obfuscating effort (since data is now added to the

narrative, and therefore its meaning in the narrativebecomes clear). However, this paper will stick to the term

analysis in order to avoid confusion.

Sifting through large volumes of data is typically

accomplished through brute force approaches in which

strings of data are matched against search queries, or wheremeta-data is matched against search queries. Such meta

data consists of file-types, time-stamps, file-ownership andso on. Fei et al. [1] propose the use of Self-Organising

Maps (SOMs) [9] to guide the investigator. Their

technique uses meta-data to detect anomalies in the data,and the investigator is thus guided by focussing analysis on

those pieces of data. Fahdi et al [10] also employs SOMs

for automated discovery of potential evidence.

Beebe has proposed the use of text-mining to achieve

better retrieval rates [11] and as a way to search throughlarge corpora [2], and Pollitt has shown that Natural

Language Processing (NLP) techniques such as Named

Entity Extraction (NEE) can be useful during the creationof the narrative [4].

The use of automated guidance during a forensicinvestigation is therefore well established, and this paper

builds on those ideas.

2.2 Expert finding and Team Formation

Finding experts is the problem of identifying individuals

who may hold knowledge. This particular problem dates

back as far as the 1990s [12], and the particular challengeset by the text-retrieval conference (TREC) in 2005 set the

scene for renewed research in the field [13].

The particular problem in expert finding is estimating

the expertise of an individual. Most notable approaches

[12, 14] use a probability distribution model in order toestimate the expertise level. Zhang et al. [15] proposes a

propagation based approach to finding an expert within a

social network.

∗∗The decryption of data is also, of course, part of the de-obfuscation

problem.

The use of social graphs to find criminal associations has

been studied by Xu et al [16]. They use shortest-path

algorithms to identify associations in criminal networks.However, their evaluation is run purely on the associativity

of the links in the network.

Once an expert is found, a social graph is typically used

to establish a team of experts within the graph. Teamformation is a well researched problem outside digital

forensics. Lappas et al [17] make use of minimum-span

trees to build a team of experts on topics within a socialgraph. They show that constructing such a structure is

NP-Hard.

Rangapuram et al. [18] extend team formation as presented

by Lappas et al to include budget and location constraints.They also allow an upper bound on the team size, and well

as a constraint to indicate the minimum level of expertise

required to complete the task the team is identified for.

Rahman [19] considers the team formation problem froman economic perspective, and the concept of opaque

and translucent teams are introduced. An opaque team

shares knowledge within the team in order to maximisethe operation of the team. In a translucent team, some

information may purposefully remain hidden in order to

enhance the attractiveness of the team. Such translucentteams, although not part of this paper, may provide an

interesting topic of study once the team formation problemin the DF sphere is well defined.

3. AUTOMATED KEYWORD EXTRACTION

Keyword extraction is the action of scanning text

documents with the explicit goal of finding keywords that

describe the document under consideration. The simplesttechnique in this field is the use of single keywords, or

using n-grams. Beliga et al. [20] provide a good overview

of automated extraction techniques. Different techniquesthat yield good results such as Latent Semantic Analysis

(LSA) and Latent Dirichlet Allocation (LDA) [21], can

also be used, however, we have not tested these on shortunstructured emails as present in the Enron corpus, and this

is left for future work. We only present three techniqueswhich relate to the focus of the paper.

Using single words as topic keywords relies on a techniquecalled Term-Frequency-Inverse-Document-Frequency

(tf-idf) [22]. This technique uses the relative importanceof a word as an indicator of the importance within a corpus

of documents. It does so by counting the number of times

a term appears within a document under consideration,and multiplying that with the number of documents in the

corpus that the keyword appears in. The standard formula

is given as:

kd,k = log(1+ t fd,k × log(|D|

|Dk|+1) (1)

The tf-idf metric works well with single word keywords,however, in many cases a key-phrase is more descriptive

Vol.108 (2) June 2017SOUTH AFRICAN INSTITUTE OF ELECTRICAL ENGINEERS48

of the document under consideration. An n-gram based

approach works well in this instance; in order to explain

the notion of such an n-gram based approach, first theconcepts of bi-grams, tri-grams, and so on are explained.

A bi-gram (or 2-gram) is a two-word pair – when using

bi-grams, the entire document is represented as a collectionof two word pairs. For example, a document D consisting

of words (w1,w2, . . . ,wl), the bi-gram representation is

((w1,w2),(w2,w3), . . . ,(wl−2,wl−1),(wl−1,wl)). Usingthis approach it is possible to construct tri-grams,

quad-grams, up to a generalised n-gram representation.

The n-gram approach suffers from scalability problems

– determining the correct value of n becomes difficult,since a document will have to be scanned multiple times

in order to discover 2,3,4, ...n combinations of words as

key phrases in the document. In most cases a bi-gram ortri-gram approach is sufficient in order to grab to most

general keywords. A general advantage of the tri-gram

approach is that it does not suffer from ’stop word’exclusion in the key-phrase.

A fast approach (Rapid Automated Keyword Extraction(RAKE)) with acceptable results to solving the keyword

extraction problem was presented by Rose et al. [23].RAKE uses word co-occurrences as a way to determine

key-phrase boundaries. A document is split into sentences,

and each sentence is divided by stop-words. Typicalstop-words are words such as determiners, coordinating

conjunctions, and so forth. Whatever remains is

considered as key-phrases. Using a co-occurrence matrix,each key-phrase is given a weight. In the typical approach

two weights are assigned: deg(w) which indicates the

number of times the word appears in the document, andf req which is the number of words that appear with w as a

key-phrase.

Using stop-words as key-phrase boundaries does result in

certain phrases being missed (such as phrases of the form x

of x). To avoid this, an implementation of the algorithm can

implement stop-word spanning which accepts a key-phrase

as a legitimate phrase based on some criteria – the originalRAKE implementation uses a key-phrase with a stop word

if it occurs at least twice in the document.

The following section formulates the TF task in DF.

4. THE TEAM FORMATION PROBLEM IN DIGITAL

FORENSICS

Generally speaking, a (cyber-)criminal contemplating a

crime has the same problem as a project manager: find ateam that will successfully complete a project. The project

requires a specific set of skills and/or knowledge related to

the task. A project manager aims to find the best group ofexperts that the budget will afford. All the team members

will have full knowledge of their role in the team. On

the other hand, the criminal has a more complex notion of‘afford’, in that the criminal should be able to convince or

influence potential members to commit parts of the crime.This means that the team may well not consist of the ‘best’

experts. The are also likely to be team ‘members’ who are

not aware of their role in the crime, or even be aware that a

crime is being commitment, through the simple executionof their jobs, or sharing of their knowledge. We define

‘aid’ as either the execution of a specific task, such as a job

function, or the sharing of specific knowledge to assist inthe execution of specific tasks.

The team formation problem is therefore formulated forDF investigations, as follows:

Definition I The Team Formation Problem in a Digital

Forensics Context

Given a set of individuals Ψ, a set of topics they

have knowledge about Θ, a graph depicting their

communication habits G =< V,E >, (where V is a setof vertices representing the individuals and E is a set

representing the edge between the vertices from V ) and a

topical definition of a committed act, find Γ ⊂ Ψ whichdepicts a likely team needed to either commit the act, or

who will be able to provide aid in order for the act to be

committed (the graph provides clues to persons who maypotentially collude in order to accomplish a specific task).

A formal definition of the notation in formulating the

team formation problem in the DF context is provided in

definition 4.1.

It is important to understand the notion of a ‘likely’ team.

The suspect may not have looked for the most influential

people, or all the experts in order to commit a crime, anyperson who has the knowledge or can lead to knowledge

may be sufficient. In particular the criminal may have

had individuals in mind who had knowledge, and whomhe would be able to influence.

This leads to a paradox in the existing definitions of teamformation: teams may not consist of the best choices, and

may more than likely resemble translucent teams [19] in

which the criminal and co-conspirators hold a residualclaim on the team. This paradox is defined as follows.

Definition II The Team Formation Problem Paradox

In order to accomplish the task at hand, thecyber-criminal’s choice in team may not consist of the

experts, or seats of power in the organisation. Normal team

formation analysis techniques rely on building a team frominfluential people or experts, meaning traditional team

formation analysis techniques may be of limited use in thiscase.

Additionally, the suspect may not be part of the team

produced during a traditional TF analysis.

This does not mean that traditional team formation analysis

techniques are useless. Since traditional team formation

coupled with Social Network Analysis (SNA) providesvaluable information on the potential team that could

be formed, they can act as a good guide during theinvestigative process.

Vol.108 (2) June 2017 SOUTH AFRICAN INSTITUTE OF ELECTRICAL ENGINEERS 49

The team formation problem as defined above therefore

requires de-obfuscated data from which the following

can be derived: a social graph for the persons underinvestigation, topics extracted from the data, and a framing

of the act in terms of the topics. This last concept

is important, since the investigator must have enoughknowledge of the domain being investigated in order to

frame the act in terms of the topics, which leads to the

following definition of the act or crime.

Definition III The Crime as a Task

In the team formation problem for cyber-crime an act,

is a task that can be defined based on knowledge that isrequired to complete it. Knowledge can be encoded into

language phrases, of which several can be used to definethe act.

Based on the above requirements, the team formation

problem is considered with respect to seized email data.The choice of using email data aids in:

1. Constructing a social graph from the email data canbe easily automated.

2. Extracting topics from the data can be approximated

by performing noun-phrase-, and named entity

extraction. Moreover, general IR techniques allowsthe easy indexing of large email corpora.

3. The terms used to define the act will correspond to

the extracted terms and can thus be used during the

guided investigation.

The following section considers the the examination of

email data.

4.1 Examining Email Corpora

Given the team formation problem as defined in

Definition 4., this section considers the identification ofwhat is termed a candidate team. This is a team that

consists of all the individuals that could potentially form

part of an ideal team. An ideal team is a team that mayhave fit the requirements of the suspect.

The Aardvark social search engine [14] attempted to find

individuals that may have been able to answer questionsfrom other individuals. It did so by determining the

likelihood that a particular individual would be able to

answer a question on a certain topic. Aardvark uses NLPtechniques, as well as crafted profiles to build its model

of users and their ability to answer question on particular

topics.

The paper builds on this idea, by showing that an

easy approximation for topics, and the social network

of the individuals can be used to build a likely team(Definition 4.) for committing the crime.

To accomplish this the following is to be done prior to theanalysis phase:

1. Create an index on topics for the corpus,

2. Create a communications network for the users of the

mail system,

3. Define the act using nomenclature from the enterprise

context,

4. Generate a sub-graph depicting the individuals

involved in communication about the topic,

5. Use the sub-graph as a basis for further analysis andinvestigation.

The set of topics each team member is knowledgeable onis derived through IR techniques from the seized email

corpus S.

For any corpus S, the following is defined for the teamformation problem in cyber-security:

Definition IV Team formation problem notation

The following notation is defined for the team formation

problem:

1. Θ represents all topics embedded in S,

2. θ ∈ Θ is the set of all topics that forms part of a searchon S.

3. Ψ represents all the individuals within the corpus,

4. δu represents all the documents directly related to

individual u ∈ Ψ. Directly related means that this

individual has a copy of this document in theirpossession.

5. ψ ⊆ Ψ is the set of individuals who are underconsideration. It may be that certain individuals

are excluded from the investigation from the start,

therefore, although S may be about Ψ, only theset ψ is under consideration. As the investigation

progresses more individuals may be added to Ψ and

removed from ψ (or vice versa).

6. δtu is the set of all documents for user u on topic t ∈ Θ

7. util(u) is a utility rating for u.

8. G =< V,E > is the social graph depicting the

interaction between all u ∈ Ψ, with V ⊆ Ψ and E ={(uk,u j)|uk,u j ∈V}

For every individual in S, it is clear that their share of

the mail will be a representation of the set of topics theydeal with on a daily basis. Having no other information,

it is reasonable to assume that this is a reflection of their

knowledge on different topics. Consider for examplethe employee that spends ninety percent of their time

corresponding about new contracts. It is reasonable toassume that they have knowledge on contracts and at

Vol.108 (2) June 2017SOUTH AFRICAN INSTITUTE OF ELECTRICAL ENGINEERS50

least some of the process around them. The utility of

this individual to the team is thus a function of the

probability distribution given for the user given that topic t

is discussed.

util(u) = p(ui|t) (2)

The utility function is purposefully provided as a functionthat could be used as part of an objective function

calculation. Since 2 can be changed to represent specificconstraints. As it stands, equation 2, assumes a steady state

– that is, no new information as it becomes available during

the investigation is considered. Consider for example adeposition which reveals beyond doubt that a particular

individual had knowledge pertinent to the investigation.

Thus, the utility function could be modified to reflect this,and the selection of candidate team would change. In

section 5.2 the utility function is changed to use a RAKE

specific calculation in order to present candidate teams andpersons of interest.

Searching for the topic α ∈ Θ, the result corpus s ∈S will contain emails exchanged by individuals within

the enterprise. Depending on the nature of the topic,the likelihood of an individual ui corresponding (either

receiving or sending an email) on the particular topic is

(using Bayes’ theorem): p(ui|t) =p(t|ui)p(ui)

p(t) .

Since S is available as the sample space, it is easyto calculate p(t|ui)p(ui) = p(ui ∩ t). Which in turn is

calculated as in equation 3.

p(ui ∩ t) =|∆t

u|

|S|(3)

Here δtu is the set of all documents covering topic t from

individual u (as defined in 4.1), and |S| is the size of the

entire corpus.

Individuals can now be ranked based on the utility they

could potentially add to the team (since ∑ui∈Ψ p(ui|t) = 1).

Based on the utility rank and the search result, it is

possible to construct G′ =< V ′,E ′

> where G′ ⊆ G, withthe constraint that V ′ ⊆ V . G′ is thus a sub-graph of G

which depicts only the correspondence on topics t. From

the investigator’s view point, G′ presents the candidate

team for aiding in a crime that requires knowledge on the

subjects that will come from the individuals in the graph.

The resulting candidate team graph G′ can then be used in

well known social network techniques such as centrality,span-tree’s to determine teams, and dense sub-graphs.

However, at this point, the investigator can simply use the

G′ to guide the analysis of particular emails that could beevidence.

Now that the concepts behind the team formation problemhave been articulated, the following section provides some

initial samples in using the generation of G′ on the Enronemail corpus.

5. EXPERIMENTAL RESULTS

In 2001, the Enron energy company was embroiled ina scandal relating to unlawful and unethical financial

practices. Enron basically used complex financial

techniques in order to hide their losses, thereby artificiallyboosting the company’s stock value. During the

investigation, the email of several hundred of the key

employees in Enron was seized and analysed.

Subsequently, the corpus was purchased and released

by Andrew McCallum who prepared the content andreleased the emails in a folder-based hierarchy, all in mbox

(RFC4155) format [24]. Petitions by several individuals

resulted in their emails being removed from the corpus, andthe result is a corpus of one-hundred and fifty individuals

spanning around 517,000 emails.

There has been a lot of research done on the corpus,

including data mining, social network analysis based onthe communication links between individuals, and so on.

The ideas presented here are (as far as the authors are

aware) the first examination of a team formation problemon the Enron corpus – specifically with the team formation

problem framed in the DF context.

The purpose of the experiment for this paper was to

consider the team formation problem on a real-world setof data. It is shown that very simple techniques can go a

long way in providing guidance to the investigator when

sifting through volumes of data.

The experiment was conducted based on the steps outlined

in section 4.1:

1. The entire email corpus (that was made available) wasindexed, and an inverted index was created. This

resulted in around 780,000 unique search terms for

the 517424 emails all stored in RFC822 mbox format.

2. For the communications network (or social network

graph) of the persons involved.

3. Several key phrases representing ‘topics’ were usedto search the corpus (thus describing the act in

terms of knowledge needed to commit or to aid in

committing),

4. A sub-graph of the individuals who communicated

about the topics was created, and merged into agraph that represents a candidate team for the act.

We avoided only listing instances where persons inthe candidate team communicated about the topic of

interest to allow the full nature of the interaction

between individuals to come to light.

We approached the identification of candidate teams from

two angles. First, we used simple information retrievaltechniques using keyword searches. This allows the

investigator to find a candidate team based on a fastkeyword search, but requires that some knowledge or

Vol.108 (2) June 2017 SOUTH AFRICAN INSTITUTE OF ELECTRICAL ENGINEERS 51

‘hunches’ of potentially interesting keywords are known

a priori.

In the second approach we used an automated keywordextractor which provides important keywords within the

corpus. This allows the investigator to start their search by

using these considered keywords as a starting point.

The information retrieval techniques are discussed next,

followed by the automated keyword techniques. Finallythe results and findings are presented in section 4.1.

5.1 Information Retrieval Techniques

Some more comments on the information retrieval

techniques used are in order. The term dictionary

constructed from the corpus contains terms stemmed usingthe Porter stemmer, and queries run against the term

database are stemmed before the search is done. The social

network graph for the employees consists of the interactionbetween Enron employees based on their in-box and sent

mail folders.

Although the graph consists of all persons interactingbased on the information from the mentioned sources,

the visual graphs presented are restricted in two ways:firstly, only individuals from within Enron are displayed

on the visualisation, and secondly, based on the likelihood

calculation presented in equation 2, only a limited numberof individuals are included in the graph. Both of these

reasons are purely for a ease of viewing consideration:

a visual graph depicting too many vertexes and theirlinks quickly degrades in readability and thus meaning (in

printed format). It was thus decided to limit the number of

nodes to something that would be meaningful and wouldbe easily digestible.

Figures 1 (page 7) and 2 (page 7) represent a constrained

sub-graph for the topics ‘regulation’ and ‘service provider’(both provide the utility value for each individual in

parenthesis).

Figure 1 shows several vertexes that are disconnected –

this revealed individuals who were corresponding about

‘regulation’ but likely not with parties in Enron.

Lack of space prevents the presentation of all the

sub-graphs, however, the candidate team graph which

includes the topics presented above is provided in Figure3. The following ‘topics’ were used for the generation:

“Federal Energy Regulatory Commission”, “Regulation”,

“Audit”, “Contract”, and “Service Provider”.

Just visual inspection of these graphs already provide good

clues as to who the individuals with potential knowledge

to help with the act are. Knowledge of the structure ofthe organisation would enable the investigator to follow

potential leads – thus the sub-graph can provide guidedinvestigation.

query:�[regulation],�Nodes�20�(of�143)

kean-s�(0.207680)

dasovich-j�(0.200405)

shapiro-r�(0.050101)

kaminski-v�(0.035026)

steffes-j�(0.015375)

fossum-d�(0.015000)

kitchen-l�(0.013200)

sanders-r�(0.013125)

lay-k�(0.011400)

lokay-m�(0.012000)

taylor-m�(0.044176)

shackleton-s�(0.028201)

haedicke-m�(0.025726)

mann-k�(0.018225) jones-t�(0.012900)

nemec-g�(0.010725)

symes-k�(0.022876)

germany-c�(0.018975)

campbell-l�(0.016425)

hain-m�(0.015000)

Figure 1: Candidate Team for topic ‘regulation’

query:�[service�provider],�Nodes�20�(of�149)

dasovich-j�(0.125795)

kean-s�(0.077939)

shapiro-r�(0.024620)

lokay-m�(0.014593)

sanders-r�(0.013732)

kaminski-v�(0.051036)

beck-s�(0.016089)

lay-k�(0.012946)

jones-t�(0.019157)

shackleton-s�(0.030046)

taylor-m�(0.025294)

mann-k�(0.021253)

nemec-g�(0.016576)

haedicke-m�(0.019083)

lewis-a�(0.021103)

scott-s�(0.018858)

symes-k�(0.017773) fossum-d�(0.016127)

keavey-p�(0.012946)

hain-m�(0.014218)

Figure 2: Candidate Team for topic ‘service provider’

5.2 Using Automated Keyword Extraction

In this section we consider the use of automated keyword

extraction in order to provide potential clues as to good

keywords within the corpus. To accomplish this weimplemented a basic version of RAKE without stop-word

spanning.

We first removed duplicate emails from the corpus, andthen scanned each email in order to extract keywords.

We also ignored words that contained numbers (as per

Lui et al [25] who potentially ignored anything thatwas not a noun/verb/adjective combination of words).

This resulted in approximately 2.4 million keywords and

phrases. This also did result in a large number ofnon-sensical key-phrases (such as a repeating ’a’, and

several what can only be described as ‘random keyboard

strokes’). Determining if these are noise or material isan interesting topic, and left for future work. Currently,

we were only interested in presenting the investigator withkeywords that could potentially be of interest – random

Vol.108 (2) June 2017SOUTH AFRICAN INSTITUTE OF ELECTRICAL ENGINEERS52

query:�[Federal�Energy�Regulatory�Commission;regulation;Audit;Contract;Service�Provider],�Nodes�20

beck-s

kitchen-l

haedicke-m

campbell-l

farmer-d

kean-s

lewis-a

shapiro-r

taylor-m

jones-t

kaminski-v

nemec-g

mann-k

shackleton-s

dasovich-j

sanders-r

hain-m

germany-c

symes-k fossum-d

Figure 3: Sub-graph for candidate team for query “Federal

Energy Regulatory Commission”, “Regulation”, “Contract”, and

“Service Provider”

words like these tended not to show up in the top keyword

list for senders.

Keywords were stored in a database which identified theemail the phrase appeared in, the employee in whose

mail directories the email was found, as well as deg(w)using the consideration from the creators of RAKE (weexperimented with a tf-idf index ranking but found no

significant advantage using a standard tf-idf approach –

further investigation is left for future work).

We followed a similar approach to Rose et al [23] in that

only the top third of returned keywords be considered

as extracted. Once a keyword is not in the top third ofkeywords for a particular email or user under consideration

it is considered referenced only. That is, a keyword that is

extracted is considered ‘extracted’ and one that is not isconsidered ‘referenced’.

We used the same techniques as proposed by the creators

of RAKE, and calculated the exclusivity, and essentialityof each keyword. All of these use the extract frequency

(ed f (w), and reference frequency of a keyword (rd f (w)).

The exclusivity of a keyword indicates how often akeyword is extracted when it appears in an email (i.e how

often is the keyword in the top third of keywords when it

does appear as a keyword):

exc(w) =ed f (w)

rd f (w)(4)

Keywords are then ranked based on ’essentiality’. This

is simply an index generated from the exclusivity of a

keyword and its reference frequency:

ess(kw) = exc(w)× rd f (w) (5)

From the above, we can then easily construct a list ofkeywords per person, or a global list of keywords that can

be used to start digging.

As mentioned earlier, the utility function presented inequation 2 was modified in this approach as equation 6.

util(u) = ess(u) (6)

6. FINDINGS

The use of automated keyword extraction provided some

interesting results which are presented here. The resultsfrom the use of information retrieval techniques only

provide candidate teams based on the hunches from the

investigator, and then only based on simple keywords.Thus the investigator may potentially view data from

third parties by following these hunches. Automatedkeyword extraction tries to reduce the error prone process

by extracting important phrases from the corpus, thereby

allowing the investigator to focus attention on thosephrases and words that make sense from the case point

of view. We found some interesting results from our

experiments.

Firstly, because keyword extraction uses a statistical

model on co-occurrence frequencies, there is no additional

information on the semantics of any keywords or phrasesthat are identified. In an email corpus, this means

that the standard phraseology such as intended recipient,

confidential information, and original message appears astop-ranked keyword in most profiles of email users within

the corpus. This also means that standard platitudes, such

as please find, keep well, would like and so on also appearfrequently. This is not surprising, since these are standard

’scaffolding’ when composing emails which are in essenceelectronic letter writing.

Secondly, there appears to be strong observational

evidence that the principle of Zipf’s law [26] applies to thekeyword ranking per person from the corpus. Zipf’s law

states that the frequency of a word in a corpus is inversely

proportional to its rank in a frequency table of that corpus.This correlates with the observation above regarding the

’scaffolding’ keywords. However, a thorough investigation

is left for future work since Zipf’s law requires a minimumlength document, and emails may not be a proper case for

Zipf’s law.

Examination of the results of keyword extractionproceeded by choosing a random person from the corpus,

and examining the top essential keywords for that person.Generic keywords (as mentioned above) were ignored, and

Vol.108 (2) June 2017 SOUTH AFRICAN INSTITUTE OF ELECTRICAL ENGINEERS 53

query���[expense�report],�Nodes�15�(of�87)

derrick-j�(96.00)

kean-s�(63.02)

haedicke-m�(18.18)

lay-k�(12.50)

shapiro-r�(55.00)

beck-s�(24.04)

kaminski-v�(19.36)

horton-s�(21.00)

corman-s�(20.00)

buy-r�(16.00)

steffes-j�(14.06)

shankman-j�(36.00)

whitt-m�(24.00)

nemec-g

fossum-d�(16.00)

Figure 4: Expense Report

topical keywords that seemed related to the functioning ofa large energy corporation were used as anchors for future

searches. As an example, the phrase ’expense report’was discovered using casual inspection (‘expense report’

appears in the top 100 keywords for persons a total of 36

times).

The Figure shows persons in the corpus who communicate,

and provides the ‘essentiality’ index for each person for

the query under consideration. By examining the topkeywords from persons identified in the candidate team

in 4, we found “federal energy regulatory commission” as

the thirty-first keyword with an essentiality ranking of 83.0and an exclusivity index of 1.0 (using high index persons

in the team). This indicates that in each email this person

sent or received this keyword was extracted – thus a highlyvaluable keyword (see Figure 5 for the candidate team).

‘Federal Energy Regulatory Commission’ appears seventimes in the top 100 keywords. Although it occurs far less,

its exclusivity index of 1.0 does indicate its importance

when it does occur.

Using this approach (viewing top keywords, and inspecting

the candidate team) it will be possible to quickly determine

which keywords are pertinent to an investigation and whichpersons are of interest. It is also interesting to note that

for Figure 4 there were several candidates who used the

keyword ‘expense report’ but did not communicate withany of the other persons in the candidate team. Inspection

of the emails revealed that it was a request to approve

an expense report related to an employee that was sentto a departmental email address. This also indicates that

potentially interesting ‘anomalies’ could be highlightedusing the proposed techniques.

query���[federal�energy�regulatory�commission],�Nodes�15�(of�80)

dasovich-j�(494.00)

kean-s�(83.00)

shapiro-r�(32.00)

sanders-r�(24.00)kitchen-l�(24.00)

steffes-j�(18.00)

lokay-m�(10.29)

kaminski-v�(25.00)

hain-m�(30.00)

haedicke-m�(26.00)

sager-e�(18.00)

martin-t�(19.00)

arnold-j�(15.00)

hyatt-k�(15.06)

thomas-p�(16.00)

Figure 5: Federal Energy Regulatory Commission (using

RAKE)

7. CONCLUSION

This paper reformulates the team formation problem

within the DF paradigm. Since the team formulation

problem is well defined outside of the DF paradigm, itis necessary to place it within the DF context in order

to understand it properly. This allows the finer nuancesand requirements dictated by the DF paradigm to be

understood. In turn, this allows future work to aim

specifically at solving particular problems in light of thereformulation. In addition, the team formation problem

allows the investigator to be guided by the data within the

system. It is important to understand that the proposedtechniques should not be considered to be an automated

system for solving a cyber-crime, these techniques should

only act as a guide for the investigator.

The team formation problem is thus considered from

the suspect’s point of view: a crime is defined with

respect to topics that are covered by the individuals in theorganisation. The team formation problem then identifies

the candidate team which would likely be able to complete

the task (i.e. commit the crime).

This candidate team provides the investigator with clues

about the individuals within the organisation that may have

formed part of the team, or those that may have beenused by the suspect in order to complete his task. The

important contribution is that the investigator is provided

with a guided approach to investigate a large volume ofdata, thereby focussing the investigation. Additionally,

there is an important benefit for privacy of third parties

(persons whose emails form part of the seized corpus, butwho have nothing to do with the act under investigation).

There will be important implications for the investigator

and investigation techniques, and further investigation hereis also warranted.

The paper also defined formal notations and definitions as

the starting point for reasoning and arguing about the teamformation problem in the digital forensics perspective.

This formal notation can be used as a foundation for futureresearch in this paradigm.

Vol.108 (2) June 2017SOUTH AFRICAN INSTITUTE OF ELECTRICAL ENGINEERS54

Now that the team formation problem has been formulated

for the DF paradigm, it becomes possible to define some

future areas of research. These include: using NLP forbetter topic extraction, such as noun-phrases, or named

entities. Once these have been extracted, the investigator

can be presented with these ’topics’ as a search filter. Suchan approach would mean the investigator no longer needs

to carefully craft the search terms, but can rely on the

automated system.

We presented results using information retrieval tech-

niques (using simply keyword searches), as well as usingautomated keyword extraction which extracted phrases

and keywords and ranked the relative importance of such

phrases as an input to identifying candidate teams.

Future work would also include comparing the results from

the techniques proposed herein to regular social network

analysis techniques.

Rahman introduced the concept of translucent team [19] in

which a team has members that may withhold informationfrom other team members. The effect of such a team

within DF would be important to understand, since a

cyber-criminal may employ such a team in order to commita crime – thereby keeping knowledge of the crime away

from those who may be able provide evidence.

The prevalence of mentioned ‘scaffolding’ text is a noiseremoval problem and future work on removing this noise

from the keywords (in the automated approach) couldsignificantly reduce the number of interesting keywords.

REFERENCES

[1] B. K. L. Fei, J. H. P. Eloff, M. S. Olivier, and

H. S. Venter, “The use of self-organising maps

for anomalous behaviour detection in a digitalinvestigation.” Forensic Sci. Int., vol. 162, no. 1-3,

pp. 33–7, 2006.

[2] N. Beebe and J. Clark, “Dealing with TerabyteData Sets in Digital Investigations,” in Advances in

Digital Forensics. Springer US, 2005, vol. 194, ch.

IFIP — The International Federation for InformationProcessing, pp. 3–16.

[3] G. Palmer, “A Road Map for Digital Forensic

Research,” DFRWS, Utica, NY, Tech. Rep., 2001.

[4] M. Pollitt, “History, Histiography, and theHermeneutics of the Hard Drive,” in Advances

in Digital Forensics IX, G. Peterson and S. Shenoi,

Eds. Seneca, SC, USA: Springer, 2013, pp. 3–19.

[5] N. Alherbawi, Z. Shukhur, and R. Sulaiman,

“Systematic Literature Review on Data Carving in

Digital Forensics,” Procedia Technology, vol. 11, pp.86–92, 2013.

[6] A. Pal and N. Memon, “The Evolution of File

Carving,” IEEE Signal Processing Magazine, vol. 26,no. 2, pp. 59–71, 2009.

[7] J. Kornblum, “Identifying almost identical files

using context triggered piecewise hashing,” Digital

Investigation, vol. 3S, pp. 91–97, 2006.

[8] V. Roussev, “An evaluation of forensic similarity

hashes,” Digital Investigation, vol. 8, pp. S43–S41,

2011.

[9] T. Kohonen, “The Self Organising Map,” in IEEE.IEEE, 1990, pp. 1464–1480.

[10] M. Al Fahdi, N. Clarke, F. Li, and S. Furnell, “A

suspect-oriented intelligent and automated computerforensic analysis,” Digital Investigation, vol. 18, pp.

65–76, Sep. 2016.

[11] N. L. Beebe and J. G. Clark, “Digital forensic textstring searching: Improving information retrieval

effectiveness by thematically clustering search

results,” Digital investigation, vol. 4, pp. 49–54,2007.

[12] K. Balog, “People Search in the Enterprise,” Ph.D.

dissertation, University of Amsterdam, 2008.

[13] A. Bozzon, M. Brambilla, S. Ceri, M. Silvestri,and G. Vesci, “Choosing the Right Crowd: Expert

Finding in Social Networks,” in Proceedings of

EDBT/CDT ’13, ser. EDBT ’13. New York, NY,USA: ACM, 2013, pp. 637–648. [Online]. Available:

http://doi.acm.org/10.1145/2452376.2452451

[14] D. Horowitz and S. D. Kamvar, “The Anatomy of aLarge-scale Social Search Engine,” in Proceedings

of the 19th International Conference on World

Wide Web, ser. WWW ’10. New York, NY,USA: ACM, 2010, pp. 431–440. [Online]. Available:

http://doi.acm.org/10.1145/1772690.1772735

[15] J. Zhang, J. Tang, and J. Li, “Expert finding in asocial networks,” in Database Systems for Advanced

Applications (DASFAA’2007), 2007.

[16] J. J. Xu and H. Chen, “Fighting organized

crimes: using shortest-path algorithms to identifyassociations in criminal networks.” Decision Support

Systems, vol. 38, pp. 473–487, 2004.

[17] T. Lappas, K. Liu, and E. Terzi, “Findinga Team of Experts in Social Networks,” in

Proceedings of the 15th ACM SIGKDD International

Conference on Knowledge Discovery and Data

Mining, ser. KDD ’09. New York, NY, USA:

ACM, 2009, pp. 467–476. [Online]. Available:

http://doi.acm.org/10.1145/1557019.1557074

[18] S. S. Rangapuram, T. Buhler, and M. Hein, “Towards

realistic team formation in social networks based on

densest subgraphs,” in WWW 2013. ACM, 2013, pp.1077–1088.

[19] D. M. Rahman, “Team Formation and Organization,”

Ph.D. dissertation, University of California LosAngeles, 2005.

Vol.108 (2) June 2017 SOUTH AFRICAN INSTITUTE OF ELECTRICAL ENGINEERS 55

[20] S. Beliga, “Keyword extraction: a review of methods

and approaches,” University of Rijeka, Department of

Informatics, Rijeka, 2014.

[21] M. Dredze, H. M. Wallach, D. Puller, and F. Pereira,“Generating summary keywords for emails using

topics,” in Proceedings of the 13th international

conference on Intelligent user interfaces. ACM,2008, pp. 199–206.

[22] S. E. Robertson, S. Walker, S. Jones, M. M.

Hancock-Beaulieu, and M. Gatford, “Okapi at

TREC-3,” 1996, pp. 109–126.

[23] S. Rose, D. Engel, N. Cramer, and W. Cowley,“Automatic keyword extraction from individual

documents,” in Text mining: applications and theory,

M. W. Berry and J. Kogan, Eds. John Wiley & Sons,2010.

[24] E. A. Hall, “The application/mbox Media

Type,” Electronically, September 2005,

http://datatracker.ietf.org/doc/rfc4155/.

[25] F. Liu, D. Pennell, F. Liu, and Y. Liu, “Unsupervisedapproaches for automatic keyword extraction using

meeting transcripts,” in Proceedings of human

language technologies: The 2009 annual conference

of the North American chapter of the association

for computational linguistics. Association for

Computational Linguistics, 2009, pp. 620–628.

[26] W. B. Cavnar and J. M. Trenkle, “N-Gram-BasedText Categorisation,” in SDAIR-94, Third Annual

Symposium on Document Analysis and Information

Retrieval, 1994, pp. 161–175.

Vol.108 (2) June 2017SOUTH AFRICAN INSTITUTE OF ELECTRICAL ENGINEERS56

PERSONAL INFORMATION AND REGULATORY REQUIREMENTS FOR DIRECT MARKETING: A SOUTH AFRICAN INSURANCE INDUSTRY EXPERIMENT A. da Veiga* and P. Swartz** * University of South Africa (Unisa), School of Computing, College of Science, Engineering and Technology, South Africa E-mail: [email protected] ** University of South Africa (Unisa), School of Computing, College of Science, Engineering and Technology, South Africa E-mail: [email protected] Abstract: The processing of personal information by companies should be in line with ethical and regulatory requirements. Whilst respecting the right to privacy, personal information can be used to create value in the economy as well as on an individual level by tailoring and targeting services. However, personal information should not be processed under false pretences for the purposes of direct marketing. Data protection regulations, such as the Protection of Personal Information Act (PoPI) 2013, regulate the processing of personal information. Accordingly, companies domiciled in South Africa have to comply with the conditions of PoPI and must process personal information in line with the agreed purpose. PoPI will have an impact on direct marketing and certain conditions will apply to protect individuals’ personal information, as well as how and by whom it is used. This research sets out to investigate whether companies in the insurance industry are complying with the direct marketing conditions of PoPI pertaining to opt in and opt out preferences as well as a few other aspects. An experiment was conducted in South Africa whereby two new cellphone numbers and six new e-mail addresses were deposited in the economy by requesting online insurance quotes from twenty different insurance companies. For half of the online insurance quotes the researchers elected to opt in for direct marketing and for the other half to opt out. Any communication received on the cellphone numbers or e-mail addresses was recorded and analysed to establish if the preferences expressed were being complied with. The results indicate that data was shared and possibly leaked; this finding was based on the number of contacts received from companies that were not part of the sample. It was found that opt out preferences for direct marketing were not honoured by some companies. Other aspects, such as the availability of the option to opt in or opt out for direct marketing when depositing personal information on websites, secure processing of personal information and the use of privacy disclaimers, were also found to be lacking in some instances. This indicates that the insurance industry in South Africa might not yet be fully compliant with the requirements for direct marking, as required by PoPI and the Consumer Protection Act (CPA). The results of the research can be used to improve direct marketing interactions with consumers, helping to ensure not only compliance with PoPI, but also the maintenance of a trusting relationship by respecting privacy. Keywords: Protection of Personal Information Act; PoPI; direct marketing; opt in; opt out; personal information; privacy.

1. INTRODUCTION

“Everyone has the right to privacy”, is enshrined in the Constitution of the Republic of South Africa (1996) [17, 59] and similar rights are regulated by means of privacy and data protection regulations in over a hundred countries [11]. Privacy is the right of the individual to be free from secret observation and to determine with whom, how and whether or not to share personal information [1]. For most people “privacy” is a meaningful and valuable “commodity”, but the term has different meanings in different contexts [2]. Privacy is an essential component of individual freedom, civil liberty, autonomy and dignity

[3, 4]. The right to privacy is the right to an individual’s autonomy and personality, which is an individual’s general right [3]. Consumers’ privacy should be respected and balanced with societal and regulatory requirements and the value provided by companies when consumers share their personal information [58]. In terms of privacy, individuals have a reasonable expectation that companies such as cellphone and internet providers, banks, government institutions, medical practitioners, and retail and insurance organisations will secure their personal information [3]. However, the right to privacy in the

Based on: “Protection of Personal Information Act Compliance: A South African Insurance Industry Experiment”, by A. da Veiga and P. Swartz which appeared in the Proceedings of Information Security South African (ISSA) 2016, Johannesburg, 17 & 18 August 2016. © 2016 IEEE

Vol.108 (2) June 2017 SOUTH AFRICAN INSTITUTE OF ELECTRICAL ENGINEERS 57

digital world is under attack as tracking surveillance is increasing and individuals’ personal records are becoming more vulnerable while being stored digitally [3]. Contextual integrity is destroyed when digital information is sold or reused; even if users give their consent, they are not always aware of the purpose for which their information will later be used [5]. Moreover, the mismanagement of personal information when processing, storing, using, collecting or exchanging such information could violate human rights. It could also result in people losing trust in companies, especially if the information is not secured and processed in accordance with regulatory requirements and what the individual consented to [6]. While consumer data can be misused, it can also be used to the benefit of the individual and the future knowledge economy by extracting value from it. Better use of data through data value chains could benefit various industries, improve research and innovation and increase productivity [28]. Companies could use personal information to add value by directing specific services to consumers based on their profile and identified need, thereby enabling better strategic and operational decisions [44, 51]. Privacy and data protection legislation provides that the collection of personal information should be lawful and fair and that it should not be carried out under false pretences. This means that personal information collected from consumers for a specific service or product cannot be used for telemarketing or advertising without the consumer’s permission [52]. This condition imposed on the processing of personal information is encapsulated in international frameworks such as the Asia Pacific Economic Cooperation (APEC) Privacy Framework [52], the Organisation of Economic Cooperation and Development (OECD) [53], regulations such as the General Data Protection Regulation of Europe [14] and international standards such as the British Standard BS 10012:2009, Data protection – Specification for a personal information management system [54] and ISO IEC 29100.2 Information technology – Security techniques – Privacy framework [55]. In South Africa, the Protection of Personal Information Act (PoPI) (2013) was promulgated in November 2013 [7, 8, 17]. This Act regulates the processing of personal information by public and private organisations domiciled in South Africa. PoPI includes a condition relating to unsolicited marketing, namely, that consent is required in certain circumstances when existing or new customers are contacted. Companies must comply with the conditions of PoPI and may contact individuals in line with those conditions. Similarly, the Consumer Protection Act (CPA) of 2010 [23] gives consumers the right to restrict unwanted direct marketing targeted at them through media such as Short Message Service (SMS), e-mail or cellphone calls. This research paper discusses research carried out to determine whether consumers’ opt in and opt out

preferences are honoured in the flow of personal information in the insurance industry, as mandated by privacy legislation and, specifically, PoPI. The research results can provide the insurance industry with insight into possible gaps in compliance with PoPI when processing personal information of their customers or potential customers for marketing purposes. This research project forms part of a larger research project undertaken by honours students from the School of Computing at the University of South Africa (Unisa) as part of a BSc or BCom Honours degree [51]. The remainder of the research paper is structured as follows: section 2 gives an overview of international privacy legislation and is followed by an overview of PoPI. Section 3 discusses direct marketing with the possible implications of PoPI in section 4. The insurance industry is discussed in section 5. Section 6 presents the research questions followed by the research methodology in section 7 and the results of the experiment in section 8. A discussion of the findings, recommendations and limitations is presented in section 9, followed by the conclusion in section 10.

2. AN OVERVIEW OF PRIVACY LEGISLATION

2.1. International privacy legislation The objective of privacy legislation is to enable the individual to (i) manage or control the flow of personal information and (ii) to give the individual autonomous space [12]. The growth of modern computing has resulted in data protection laws being implemented in many countries. In 1974 the United States of America drafted its privacy legislation. Germany followed in 1977 and France in 1978 [12]. Data protection laws have been adopted by over 100 countries [8, 11, 47]. India, as well as a number of countries in Africa and South America, is currently in the process of enacting privacy laws [47].

In the United States, the processing of personal information is regulated through a sectorial approach whereby privacy laws address a specific industry. Such laws include the Fair and Accurate Credit Transactions Act (FACTA) of 2003 [56], which focuses on the financial industry, and the Health Insurance Portability and Accountability Act (HIPAA) of 1996 [57], which focuses on health data. Studies indicate that Americans are mainly concerned about solicitation, government monitoring and the commercial use of personal data, whereas European citizens are found to be concerned about the collection and sharing of their personal information [46, 58].

The General Data Protection Regulation (GDPR) [14], which replaced the European Union’s (EU) Data Directive 95/48/EC [12, 13], addresses new technological developments and harmonises national data protection laws across the European Union member

Vol.108 (2) June 2017SOUTH AFRICAN INSTITUTE OF ELECTRICAL ENGINEERS58

states [15]. In addition, the EU-US Privacy Shield was implemented in 2016 to regulate the processing and flow of EU citizen data by US companies [45].

Most privacy or data protection laws are based on the Code of Fair Information Practices (FIP) [60], the OECD Privacy Guidelines [53] and APEC Privacy Principles [52]. PoPI in South Africa mirrors these international privacy principles.

2.2. Protection of Personal Information Act (PoPI), 2013 The purpose of PoPI is to provide a constitutional right to privacy by protecting the individual’s personal information when this is processed by a responsible party. In this context, the individual is referred to as a “data subject”; this is the “person to whom personal information relates” and who is an identifiable, living, natural or juristic person [17]. The responsible party is the “public or private body or any other person which, alone or in conjunction with others, determines the purpose of and means for processing personal information” [17].

“Personal information” is information relating to the data subject, such as biographical information (e.g. race, gender, marital status, disability or religion), education, medical or financial information, e-mail and physical addresses, biometric information, and even information about personal opinions and views, including correspondence [17].

PoPI regulates the manner in which personal information may be processed, in line with international standards and established conditions, and according to the prescription of the minimum threshold requirements for the lawful processing of personal information. The term “processing” means any action that is performed on the information throughout its life cycle, including “the collection, receipt, recording, organisation, collation, storage, updating or modification, retrieval, alteration, consultation or use; or dissemination by means of transmission, distribution or making available in any other form; or merging, linking, as well as restriction, degradation, erasure or destruction of information” [27]. PoPI also provides for the rights of data subjects, and the remedies available to them, to protect their personal information from processing that is not in accordance with the Act. PoPI provides for the establishment of an information regulatory body with certain duties and powers in line with the conditions of PoPI and the Promotion of Access to Information Act (PAIA), 2000 [18]. The chairperson and members of the Information Regulator have been appointed as of December 2016 [19]. South African citizens’ personal information is also processed outside South Africa by multinational organisations and through the internet, which renders

the information vulnerable [6]. The sensitivity of personal information changes as it flows through the economy, therefore the security and privacy requirements are dynamic [16] and should at all times be processed in line with the regulatory requirements of the relevant jurisdictions.

PoPI has a significant impact on an organisation’s policies, employees, information technology infrastructure, third-party service providers and procedures if the organisation aims to comply with the provisions of the Act [20]. Accordingly, the Act affects responsible parties that collect, process and store the personal information of customers, employees and third parties as part of their operational activities [7].

3. USE OF PERSONAL INFORMATION FOR MARKETING PURPOSES

In direct marketing, the marketer communicates directly with a customer or client in the hope that the customer will respond positively to the marketer’s request [29]. Any type of electronic communication – such as an SMS to a cellphone, e-mails, mobile device application advertising and social media marketing – serves as a tool used by the marketer to advertise services or products. In a study conducted by Microsoft it was found that consumers are willing to share their personal information if they are explicitly asked for permission and if there is clear benefit in return [49]. According to the parliamentary text of the GDPR, consent must be explicit, indicate affirmative agreement from the data subject, and is valid as long as the personal information is processed for the purpose it has been collected for. Ethical considerations emanate when personal information, which has been collected for a specific purpose to which the consumer agreed, is used for another purpose without consent [48].

According to section 69 of PoPI, the customer must grant permission for the processing of personal information and must also have the option to cease any communications [17]. The consent option for processing personal information is referred to as “opt in”, and the rejection of future communications from the marketer is referred to as the “opt out” option.

Consumers are sometimes misled about their choice to opt in or opt out on companies’ websites or application forms. For example, the default setting on most websites is set to opt out, or the questions that are asked (“Please send me newsletters” or “Please do not send me newsletters”) are trivial and might influence consumer decisions [31]. Because of inattention, and cognitive and physical laziness, default answers are given. If the opt in option is given it is often ticked by default if marketers need the consumers to opt in for the processing of personal information [32]. As such, compliance with

Vol.108 (2) June 2017 SOUTH AFRICAN INSTITUTE OF ELECTRICAL ENGINEERS 59

PoPI could impact negatively on companies’ freedom to use marketing and communication initiatives.

Section 11 of the CPA [23] stipulates that every person has the right to privacy and to refuse unwanted SMS’s cellphone calls, letter or “spam” e-mails. This gives consumers the right to “opt out” of direct marketing communications, where after companies or suppliers may not continue to contact the consumer. In support of this, the CPA provides for the establishment of a “do-not-contact registry” whereby consumers can register to opt out of all unsolicited marketing. This has however, not been implemented as yet. The Direct Marketing Association of South Africa (DMA) currently maintains a register of consumers who do not want to be contacted for direct marketing, but this will only apply to opt out of direct marketing for organisations that are registered with the DMA [62]. This includes approximately 373 companies including the major banks in South Africa, as listed on the DMA website.

The CPA has a consumer protection focus and does not focus on the security of personal information or the lawful requirements for usage of personal information. PoPI addresses these limitations and also focuses on unsolicited marketing, but from an opt in and opt out perspective. According to section 69 of PoPI, the processing of personal information for the purpose of direct marketing is prohibited, unless the marketer has the consent of the data subject, the data subject is a customer of the responsible party, the responsible party has the customer’s contact details and they market similar products or services to the data subject. Responsible parties may contact new customers only once for direct marketing, whereafter the consumer must opt-in to receive further marketing communication. In addition PoPI requires responsible parties to inform data subjects of the source from which they collected their personal information in support of openness and transparency (PoPI s18(1), [17]).

4. PRACTICAL IMPICATIONS OF POPI

PoPI will have a positive impact from an organisational and data subject perspective as discussed below.

Preventive measures: Responsible parties who collect personal information must be accountable and transparent, and should safeguard personal information according to condition 7 of PoPI [34]. According to De Bruyn [7], companies are now implementing proactive technical and organisational measures in the hope that these will prevent the leaking of personal information. These measures should ensure that companies’ databases are secure in order to prevent data leakage and to protect their investments.

Transparency: Another advantage, according to De Bruyn [7], is that companies will be more transparent in terms of how, what and where personal information is stored within the company. Companies must notify data

subjects when personal information is processed, (s 18, [17]), and data subjects have the right to opt in or out, free of charge, in respect of receiving marketing communication (s 69, [17]). Consent must be given before personal information is shared with third parties for marketing purpose (ss 11 and 20, [17]), therefore data subjects should not under normal circumstances receive unsolicited SMS’s, phone calls or e-mails [7]. All businesses or parties responsible for big data and the analysis of an individual’s habits, purchase behaviours or health status must be transparent in their use of the personal information, ultimately protecting the right of the individual while abiding by ethical principles [21].

Individuals’ rights: If data are inaccurate, misleading, excessive or incomplete, or if data have been obtained unlawfully, data subjects can rightfully request an update, deletion or correction of their personal information according to section 16 of PoPI [17, 22]. Wilson [21] argues that the laws protecting the privacy of personal data give individuals the rights to all their data, irrespective of the source. PoPI also enables individuals to institute civil proceedings under certain circumstances if there has been interference with the protection of their personal information (ss 5 and 99, [17]).

Whilst there are benefits attached to protecting the privacy of consumers, many companies believe that PoPI will have a negative effect on them as explained below.

Marketing costs: The CPA [23] only allows for an opt out mechanism. Section 11(5) of the Act states that if a consumer opts out of receiving direct marketing, they cannot be charged a fee for doing so. PoPI stipulates that affirmative consent is required, which means that individuals have to opt in to receive direct marketing messages (s 69, [17]). PoPI also requires that the customer be given reasonable time to object, at no cost to the data subject, which means that the business is responsible for all costs if the customer opts out at a later stage [24]. Companies must update their IT systems to flag the option to opt in or opt out of direct marketing (s 11, [17]). Company processes for responsible parties and third parties must also be updated according to section 13 of PoPI, with the provision that personal information can be shared only if the purpose is specific, the quality of information is assured (s 16, [17]) and the information is safeguarded (s 19, [17]). This has an impact on IT system designs, administration and governance processes, and contractual processes with third parties.

Operational costs to companies: Critics have warned that the PoPI regulatory scheme will discourage economic activity and place undue burdens on businesses, because many businesses will have to make supplementary investments in information technology systems or use third-party vendors in order to comply with PoPI [25].

Compliance time frames: To be compliant within one year is impracticable, as shown by a survey conducted in South African businesses in 2013, and it could take up to

Vol.108 (2) June 2017SOUTH AFRICAN INSTITUTE OF ELECTRICAL ENGINEERS60

three years to become fully compliant [10]. Companies have to overcome huge challenges to become compliant and need to start before the implementation of PoPI. Moreover, companies that are already implementing measures to comply with PoPI requirements are concerned that they will not be compliant in time [10]. A study conducted by Cibecs in 2012 shows that 26% of South African companies are in the process of complying with the requirements of PoPI [37]. They found that as many as 38% of the companies surveyed still have out-dated security measures in place. It therefore seems as though company efforts to comply with PoPI are still in progress [2, 37].

5. USE OF PERSONAL INFORMATION BY THE

INSURANCE INDUSTRY

The insurance industry processes large quantities of personal information for the purposes of underwriting [63]. To be competitive in the insurance industry, companies have to market their products. One marketing method used by insurance companies is cold-calling. According to Millard [33], although the Financial Advisory and Intermediary Services Act (FIAS) 37 of 2002 [34] and the CPA [23] address this issue, PoPI will eliminate the cold-calling sales technique altogether. Section 69 of PoPI prohibits unsolicited marketing unless the customer (data subject) consents to it [17].

According to a global study done in the health services, 6% of data breaches are committed by insurance companies, the third highest out of 17 industries [35]. Cybersecurity insurance is expanding rapidly in the insurance market, with forecast annual sales of $7.5 billion globally by 2020 by the global cyber insurance market [36, 37]. If an insurance company wants to expand its products and provide cybersecurity insurance in South Africa, it must set an example and comply fully with the requirements of PoPI, especially if it plans to use direct marketing to create awareness about its cybersecurity insurance products.

Many companies in South Africa believe that it will require significant effort to become PoPI compliant, with some estimating that it could take in excess of 9 000 hours [10]. While some companies have started with the implementation process, research indicates that it could take more than a year to become fully compliant, while many companies believe that it could take up to three to five to achieve this [10, 30]. Although it is thought that companies have started the process of implementing the conditions of PoPI, many might not yet have done so. Once the provisions of PoPI come into effect, companies will have one year to comply with the Act.

6. RESEARCH QUESTIONS

The following main research question has therefore been formulated:

- Do South African insurance companies only contact customers if they have opted-in for marketing and communication purposes as required by PoPI?

The answer to this research question could indicate to insurance companies whether they are ready to comply with certain conditions for marketing in PoPI. While establishing the answer for the research question the experiment also allows the following sub research questions to be answered:

a. Do all companies in the sample include an opt in or opt out preference for direct marketing on their websites when collecting personal information of data subjects for the purpose of an online insurance quote?

b. Did companies that were not part of the sample contact the data subject?

c. Do all companies in the sample have a privacy disclaimer or policy on their website?

d. Did all SMS’s received include an opt out preference, free of charge to the data subject?

e. Do all companies in the sample use a secure method to process the data subject’s personal information when collecting their personal information via an online insurance quote?

7. RESEARCH METHODOLOGY

The research methodology used is based on an experimental design. The researchers made use of two new cellphone numbers with related e-mail accounts that were created for the purpose of the experiment. These were used as contact information when requesting online quotes from 20 insurance companies. For the one cellphone number the researchers aimed to opt in and for the other to opt out for direct marketing communication. The researchers then monitored communication received on the cellphone numbers and related e-mail accounts to establish if the opt in and opt out preferences were honoured as well as to examine a few other aspects which were considered as part of the experiment.

The next section provides a detailed overview of the research methodology.

7.1 Research paradigm

A positivist paradigm applies to this research. A positivist paradigm is based on realist ontology beliefs, where there is an objective reality according to representational epistemology in terms of which symbols are used to explain and describe the objective reality accurately [38, 39]. Cohen and Crabtree [39] state that positivism can reveal the causal relationship that exists in social life,

Vol.108 (2) June 2017 SOUTH AFRICAN INSTITUTE OF ELECTRICAL ENGINEERS 61

such as the flow and use of personal information in the economy.

7.2 Research design

De Villiers [50] explains that a positivist paradigm is one where knowledge is created through the application of mainly empirical methods that could include experiments whereby reliable, consistent and unbiased data are obtained. An experimental design was used for this research project. Experiments are defined by Payne as, “ways of assessing causal relationships, by randomly allocating ‘subjects’ to two groups and then comparing one (the ‘control group’) in which no changes are made, with the other (the ‘test group’) who are subjected to some manipulation or stimulus” [64]. This design allowed the researchers to have control over the experiment which also strengthens the internal validity [40]. Miller and Brewer [41] suggest that if the experiment is carried out correctly, the testing effect, mortality, history and maturation, as possible pitfalls of internal validity, will not have an effect on the research outcome from an internal validity perspective. Two groups were involved in the research, namely, the experimental and the control group; a stimulus was applied to the experimental group and no stimulus was applied to the control group [41].

7.3 Control group

In this research, the control group comprised four new SIM cards that were purchased, one from each of the major cellphone providers in South Africa, referred to as cellphone provider I, II, III and IV. The associated cellphone numbers were not deposited in the economy and were not used to obtain any online insurance quotes or to make any phone calls. For the purpose of this experiment the lecturers involved in the project were responsible for monitoring the control group. 7.4 Experimental group

To conduct the experiment, two new cellphone numbers (cellphone A and B) and six new e-mail addresses (e-mails A, B, C, D, E and F) were utilised, which allowed the researchers to supply personal information when requesting online quotations from the sample of insurance companies. Cellphones A and B were obtained from the same cellphone provider, referred to as cellphone provider IV. In this group project twenty students participated, each obtaining two new cellphone numbers from the various cellphone providers to conduct the experiment. The scope in this paper is however limited to only one instance of the experiment and thus only the

experimental results of this specific insurance industry experiment is reported on. In the case of the experimental group, the newly purchased cellphone numbers were deposited in the economy. External validity could affect the experiment where the experimental and control groups are not identical to start with [41]. Some factors that could affect the external validity are the processes followed by the various retail outlets where the cellphone numbers were obtained, which could lead to data leakage or sharing. In addition, some cellphone numbers might relate to numbers that are reused by the cellphone providers, which could affect the results as the previous owner would have already shared the cellphone number with organisations that might conduct direct marketing. These factors were considered by the researchers.

7.5 Sample

This research focused on the insurance industry in South Africa. The insurance industry collects personal information from online applications, telephonic marketing and hard-copy applications, as well as their claim processes.

The geographical area was limited to South Africa. The head offices of the insurance companies included in the sample are mainly located in the metropolitan areas of each province.

Twenty insurance companies were identified and included in the sample. The sampling method used for this research project was a convenience sample [36]. A prerequisite for inclusion in the sampling was that the insurance company had a website where online insurance quotes could be requested as part of the experiment. This experiment was conducted from a consumer perspective and hence, to protect the confidentiality of the companies in the sample, the company names are withheld. 7.6 Experiment Preparation

Table 1 shows the two new cellphone numbers, Cellphone A and Cellphone B, which were purchased in March 2015 for the purpose of this research. These cellphone numbers were purchased under the personal information of one of the researchers. For the remainder of the discussion this profile will be referred to as the “data subject”. The researchers used the same profile in the dealings with all the insurance companies selected for the research. When the researchers purchased the SIM cards from the service providers, personal information was verified in line with the Regulation of Interception of Communications and Provision of Communication-Related Information Act (RICA), 2002 [43]. The six e-mail addresses were created in April 2015, each using a different internet service provider. Two of the six

Vol.108 (2) June 2017SOUTH AFRICAN INSTITUTE OF ELECTRICAL ENGINEERS62

e-mail addresses were linked to the cellphone numbers A and B, see Table 1. Cellphone numbers A and B and the related e-mail addresses A and B were deposited at the first 10 insurance companies, A to J. Table 1: Cellphone numbers and sample companies

The remaining four e-mail addresses (e-mails C, D, E, and F) were included in the personal information supplied to the next ten insurance companies (K to T) in the sample. The researchers set out to use these e-mail addresses for opt in and opt out preferences without any cellphone numbers being linked to the e-mail addresses. However, it was found that no information could be submitted (no online insurance quote could be obtained) without providing a cellphone number. Cellphone A was therefore also submitted with e-mail addresses C and D, and cellphone B was submitted with e-mail addresses E and F. 7.7 Conducting the Experiment

Personal information was deposited in the insurance market in May 2015. The method used to deposit personal information was to request life insurance or short-term policy quotations from insurance companies using the online application process on the companies’ websites. For the duration of this research project, the cellphone numbers and e-mail accounts were not used for any other purpose.

Figure 1 shows the personal information fields that the insurance companies requested in order to process the insurance quotes. All 20 companies required a name and surname followed by the date of birth. Less than half required the personal identification number and some required the data subject’s occupation and income. These fields of personal information were therefore included in the customer records in the companies’ databases. In this way the personal information of the profile used was deposited in the economy, and the researchers were able

to monitor the flow of the personal information through the communications received on the new cellphone numbers and e-mail addresses.

6

6

8

9

11

13

20

0 5 10 15 20

Income

Occupation

Delivery Address

ID

Gender

Birth date

Name and surname

Figure 1: Personal information deposited The researchers aimed to opt in for direct marketing for cellphone A and to opt out for cellphone B. However, opt in and opt out preferences were provided on only six of the insurance company websites. Where the opt in or opt out preferences were not provided on the websites, the researchers still requested online insurance quotes.

The researchers activated the control group cellphone numbers on the network by sending at least one SMS to another number. No stimuli were applied to these numbers, meaning that the researchers did not deposit the cellphone numbers with any company nor did they use the associated cellphones for phone calls or text messaging. This would eliminate any biased results during the experiment because the cellphone numbers were not subjected to any experimental treatment.

7.8 Data Collection

Data were collected by means of cellphone calls, SMS’s and e-mail messages received from companies that contacted the data subject on either of the two cellphone numbers or any of the six e-mail addresses created for this experiment. The cellphone calls were answered by the researchers and were received from just after eight in the morning until four in the afternoon. Information about each cellphone call and SMS was recorded daily on a spreadsheet, and information about e-mail messages received was recorded twice a week.

The time frame for collection was from May to October 2015. During this time, the researchers recorded certain aspects, such as the origin of contact details; whether the data subject opted in for the communication; whether there was an option to opt out of any future communication; whether the company was one of the companies in the sample; whether the data subject was liable for any cost when opting out; and whether the data subject was contacted by an automated calling machine.

SIM cards (2 in total)

E-mails (6 in total)

Marketing Opt in/ Opt out

Company

Cellphone A

E-mail A, Yahoo Opt in A to J

E-mail C, Gmail Opt in

K to T

E-mail D, Yahoo

Cellphone B

E-mail B, Gmail Opt out A to J

E-mail E, Outlook Opt out

K to T

E-mail F, Hotmail

Vol.108 (2) June 2017 SOUTH AFRICAN INSTITUTE OF ELECTRICAL ENGINEERS 63

8. RESULTS

8.1. Overall contacts received

In total, the data subject was contacted 84 times during the data collection period on either the cellphone numbers or e-mail addresses created for the experiment. Fifty-five per cent of all communications were received via SMS. Twenty-eight per cent were received via e-mail messages when quotations had been requested from insurance companies. Cellphone calls accounted for only 17% of the contacts with the companies (see figure 2). These calls were received from the insurance companies that called about the quotations requested.

46

14

24 No of SMS contacts(46)No of cellphonecontacts (14)No of email contacts(24)

Figure 2: Number of contacts received per method

Only 22% (10 out of 46) of the SMS’s were sent by those insurance companies that had the personal information of the data subject, whereas eight of the 46 SMS’s were sent by the cellphone service provider. The remaining 28 of the 46 SMS’s received came from entities that were not part of the sample and hence had no permission to contact the data subject nor did they have any information about the data subject.

8.2. Contacts received from companies in the sample and not in the sample

Forty-eight of all the contacts were linked to companies included in the sample and eight contacts were received from the service providers.

Twenty-eight contacts (excluding the service provider contacts) – thus 33% – were from companies that were not part of the 20 companies in the sample. These contacts represented 18 different companies. This indicates that the information could have been shared with third parties who used it to contact the data subject, as these companies did not have permission to contact the data subject for marketing purposes via the cellphone numbers or e-mail addresses used in the research. Most of these companies contacted the data subject only once, but two companies contacted the data subject at least eight times each during the experiment to offer financial services or to give the notification that they (data subject) had won a competition.

8.3. Contacts received for opt in and opt out preferences

• Opt in group contacts received

The data subject received 28 contacts in the opt in group. This excludes contacts from the cellphone service provider and companies not in the sample. These were from a total of 14 companies. Of these 14, only two companies provided the option for the data subject to actually opt in on their website when requesting the online quote, namely company O which contacted the data subject in seven instances and company F which contacted the researcher on one occasion. The 12 companies that did not provide for an opt in option still contacted the data subject. This is not a concern if data subject planned to opt in, but it is a concern if data subject planned to opt out. It indicates compliance with the CPA where the opt out is included, but not with the requirements of PoPI that require opt in.

• Opt out group contact received

The opt out group received a total of 20 contacts emanating from seven different companies. This excludes contacts from the cellphone service provider and companies not in the sample. Of these seven companies, only two provided the option on their website to opt out. However, these two companies, namely company C and O, still contacted the data subject for direct marketing. The other five companies, A, D, E, K and P, contacted the researcher without having given the option to opt out on their website when requesting the online insurance quote. This is a concern if the data subject planned to opt out as the option was not provided and the companies still contacted the researcher. Three of these companies, namely companies A, K and P, though, had the opt-out preference in the SMS’s they sent, which indicates that they comply with the CPA.

Table 3 provides a summary per company of the number of times the data subject was contacted, by e-mail, SMS or cellphone calls. This table includes all contacts for websites, whether the opt in and opt out preference was available or not.

Table 3: Number of contacts received per company for opt in and opt out preferences (excluding service provider contacts and companies not in the sample)

Company Name

Opt in number of contacts

Opt out number of contacts

Company A 1 3 Company B 0 0 Company C 0 1 Company D 1 1 Company E 1 1 Company F 1 0

Vol.108 (2) June 2017SOUTH AFRICAN INSTITUTE OF ELECTRICAL ENGINEERS64

Company G 2 0 Company H 0 0 Company I 1 0 Company J 2 0 Company K 1 8 Company L 0 0 Company M 2 0 Company N 2 0 Company O 7 5 Company P 3 1 Company Q 2 0 Company R 0 0 Company S 2 0 Company T 0 0

The results indicate that almost half of the contacts made by Company O were permitted and used e-mail address C or D. There was no consent for the other half of the communications received from Company O, as the data subject had opted out when using e-mail addresses E and F. There was no consent for 90% (8 out of 9) of contacts made by Company K, but there was a privacy disclaimer on the website regarding the protection of personal information that said that customers would be contacted only about a requested quotation.

Company A contacted the data subject four times. This company also had a privacy disclaimer on its website, indicating that it would protect the data subject’s personal information and would contact the client only about the quotation being requested. The data subject elected to opt out of communication from Company A, but no option was provided to opt out during the application process. The websites of Company A and Company K did not offer opt in or opt out options on their application/quotation systems, but they did include privacy disclaimers that promised to protect the customer’s personal information. Thirty-eight per cent of all the companies that contacted the data subject did not have the data subject’s personal details, and it was unknown how the contact details had been obtained for 35% of the communications received.

Only 43% of the SMS’s received included the option to opt out of communications. Most of the 43% of the SMS’s that included the option to opt out indicated that standard rates would apply to opt out. None of the phone calls received were from an automated calling machine.

Table 4 gives a summary of all the contacts received for cellphones A and B per month without distinguishing between the companies that provided the option to opt in or opt out on their websites. Most of the opt in group contacts were received in the first month for cellphone A. The data indicate that both cellphone numbers received communications.

Table 4: Contact overview per cellphone

Cel

lpho

ne A

Month Contacts from sample

Service provider contacts

Not in sample contacts

May 17 0 0 June 10 0 2 July 0 2 1 Aug 0 0 Sep 1 0 2 Total 28 2 5

Cel

lpho

ne B

Month Contacts from sample

Service provider contacts

Not in sample contacts

May 3 0 0 June 14 0 8 July 3 6 6 Aug 0 0 4 Sep 0 0 5 Total 20 6 23

8.4. Control group

The control group received a total of 70 communications, nine missed calls and 61 SMS’s. This was almost as much as the experimental group, however, 55 of the contacts were from the cellphone service providers. Three of the cellphone numbers (Cellphone Provider I; Cellphone Provider II; and Cellphone Provider III) did not receive any communication from other companies. The cellphone number from Cellphone Provider IV accounted for nine missed calls, of which six were from different numbers, as well as six SMS’s from other companies. These SMS’s were messages from financial service providers or a message that the data subject had won a competition.

This cellphone number might have been owned and used by another individual in the past, which could explain the contacts from other organisations. Alternatively, it could indicate that the data subject’s information was leaked at the cellphone provider or retail store where the sim was purchased and not necessarily by the insurance companies. This could be further investigated in future research by increasing the sample of the control and experimental groups to determine the source of contacts and whether the cellphone numbers are linked to a marketing database.

Vol.108 (2) June 2017 SOUTH AFRICAN INSTITUTE OF ELECTRICAL ENGINEERS 65

9. DISCUSSION

The data subject received contacts from companies where the opt out preference was applied as well as contacts from companies that did not give the data subject the option to opt in or opt out when requesting an online insurance quote. This answers the main research question, namely, “Do South African insurance companies only contact customers if they have opted-in for marketing and communication purposes as required by PoPI?”, showing that consumers are contacted even if they have opted out of direct marketing. Section 69(1) of PoPI stipulates that data subjects must give their consent to the responsible party to process their personal information and must opt in for marketing purposes.

At the time when the data was submitted via the insurance companies’ websites, only six of the 20 companies made provision for consumers to opt in or opt out for any marketing communications. This answers sub research question a, “Do all companies in the sample include an opt in or opt out preference for direct marketing on their websites when collecting personal information of data subjects for the purpose of an online insurance quote?” In future, companies will have to give new customers the option to opt in for marketing and communication, and allow existing customers to opt out at any time for such purposes, as per section 69 of PoPI [17].

An additional 18 companies that were not part of the sample also contacted the data subject, which answers the sub research question, b aiming to establish if companies that were not part of the sample might contact the data subject for direct marketing. Direct marketing from the companies that were not part of the sample are not in compliance with the requirements of PoPI, as these companies contacted the data subject via SMS for marketing purposes without having consent to do so.

Only two companies included a privacy disclaimer on their website, stating that they valued personal information, would protect it and would contact customers only about the product or service they were interested in. The remainder of the companies did not comply with section 18 of PoPI, which requires that the data subject must be aware of the purpose of information collection and other aspects of processing in terms of transparency and openness requirements [17]. Research question c, intended to establish if privacy disclaimers or policies were available on the sample company websites, has thus been answered.

According to section 69(4b) of PoPI, the responsible party or third parties responsible for direct marketing must supply their address or contact details to enable recipients to opt out of any future communication [17]. The 43% of companies that sent SMS’s without an opt out option therefore did not comply with PoPI or the CPA. In addition the data subject had to pay standard SMS rates to opt out for the other SMS’s. This answers

sub research question d, which aimed to establish if all SMS’s received included an opt out preference, free of charge of the data subject. Data subjects must be given the option to opt out of or withdraw their consent for the processing of information and future marketing communications from third parties as per section 69(4b) of PoPI [17].

Because SMS’s were received from unknown senders as well as companies that were not included in the sample, it was difficult to establish the origin of all messages, or the ways in which personal information was leaked or shared to these entities, because the researchers were in no position to confirm how the entity got the information to make the contact. However, these messages indicated that data, specifically personal information, were shared in the economy with third parties as the cellphone numbers and e-mail addresses were used only when submitting the information on the websites of insurance companies included in the sample.

The last sub research question, e, aimed to establish if all companies in the sample used a secure method to process the data subject’s personal information when they request an online insurance quote. Thirty five per cent of the companies did not use a secure website when processing personal information when the online insurance quotes were obtained. Responsible parties are required, according to section 19 of PoPI, to ensure the integrity and confidentiality of personal information which they process [17]. This is a vulnerability that could result in unauthorised access to confidential information, such as income or health status, of the data subject.

9.1 Summary of the research findings

Table 5 gives a summary of the findings in the experiment, highlighting the aspects found that were not in compliance with PoPI based on the research scope as well as high level recommendations.

Vol.108 (2) June 2017SOUTH AFRICAN INSTITUTE OF ELECTRICAL ENGINEERS66

Table 5: Summary of experiment findings

9.2 Limitations

For the purpose of the experiment it was assumed that companies were in the process of becoming compliant with PoPI as, to date, PoPI has been promulgated for three years and companies will have only one year to become compliant once it is in effect. A limitation was that the conditions of PoPI, apart from those relating to the establishment of the Information Regulator, were not yet effective, which means that companies do not have to be compliant as yet unless they are a multinational organisation operating in other jurisdictions with data protection laws. This could be the reason why the results

of the research indicated non-compliance for certain sections and conditions of PoPI. Taking into consideration that it could take between three and five years to become compliant, it is anticipated that companies should have started to implement measures to prepare for compliance.

Another limitation of the research project was the limited time frame available to monitor communication received on the cellphones and e-mail addresses. This limitation arose due to the project timeline being in line with

Experiment findings PoPI reference and recommendation

Comply

Recommendations

Opt out Some contacts received were related to the opt out preference (excluding service provider contacts and companies not in the sample). Thus, non-compliance with the opt out preference.

Section 69 (1)(2)(3) [17] Direct marketing is prohibited, unless consent is obtained. Can be approached once to opt in. Can contact existing customers for similar products or services, but include an opt out.

No

Direct marketing communications should include opt in and opt out preferences for data subjects. Databases with customer information should incorporate direct marketing preferences, which should be maintained. Include opt-in preferences for marketing at the point of data collection, such as a website.

Opt in preferences on websites Not all companies in the sample made provision for opt in or opt out preferences on their website. Non-compliance with obtaining consent for direct marketing.

Consent A number of companies that contacted the data subject were not part of the sample, which could indicate that they are processing personal information without the consent or knowledge of the data subject.

Section 11.(1) [17] Processing should only be carried out with consent, for contractual purposes, obligations imposed by law to protect legitimate interests, or for performance of public duty.

No

Ensure that all processing of personal information of data subjects are lawful and that consent for processing is obtained where applicable, especially for direct marketing.

Privacy disclaimers Only some of the sample companies had a privacy disclaimer on their website. This indicates non-compliance with openness.

Section 18 (1) [17] Making sure the data subject is aware of the purpose of information collection and other aspects.

No

Include privacy disclaimers and policies on for instance organisational websites.

Opt out preference in SMS’s Almost half of the SMS’s received did not provide an option to opt out. Where this was included it stated that standard fees would apply, which is not in line with PoPI, unless standard fees are defined by the companies as “no cost” to the individual.

Section 69 (1)(2)(3) [17] Direct marketing is prohibited, unless consent is obtained. Can be approached once to opt in. Can contact existing customers for similar products or services, but include an opt out. Opt outs should be free of charge

No

Include opt in and opt out preference options in all commination with data subjects, free of charge.

a. Security The majority of the companies did not use a secure website when processing personal information.

Section 19 (1) [17] The responsible party must secure the integrity and confidentiality of personal information that it processes.

No

Implement secure processing of personal information on websites such as https.

Vol.108 (2) June 2017 SOUTH AFRICAN INSTITUTE OF ELECTRICAL ENGINEERS 67

university year module timelines. A longer time period could be valuable in determining if all companies that were not part of the sample would continue to contact the data subject without an opt out preference.

A further limitation was that some of the cellphone numbers used could have previously belonged to other people, therefore some of the communications received via SMS during the research might have been meant for the previous owner of a cellphone number. Not all communications were therefore necessarily applicable to the research.

Another limitation to consider in the research project is that there was no control over the information processed in line with the RICA Act of 2002 [43] by the store and the service provider from whom the SIM cards were purchased. Personal information could also have been leaked during this process.

A larger sample across the cellphone service providers is required so as to include all cellphone service providers in the experiment and thus allow further investigation into where the information is shared. Additional control groups – such as a control group for the internet service providers – could also be valuable to establish any data leakage from the creation of the e-mail addresses. A relationship between the e-mail addresses and the contacts received was not conducted in this study and further research could investigate any potential correlation with the communications received.

A further factor to consider is phishing attempts that could have been used to obtain personal information for malicious purposes [61] especially in the instances where the data subject was contacted in respect of competition prizes that they had supposedly won. However, the data subject did not respond to any of the SMS’s or e-mails received.

10. CONCLUSION

The objective of this research was to establish whether certain conditions of PoPI were complied with from a direct marketing perspective. The insurance industry of South Africa formed the sample population and an experimental design was used. Compliance was investigated by establishing whether the consumer (data subject) was contacted by the companies in the selected sample if they had not opted in for any communication and direct marketing.

The results indicated that a number of companies in the sample (excluding the cellphone service providers) that contacted the data subject did not have the data subject’s consent for direct marketing. Some of the companies that contacted the data subject were not even part of the original sample where the cellphone and e-mail addresses had been deposited, indicating that data could have been shared or leaked. In addition, almost half of the SMS’s received did not include the option to opt out, the majority of the insurance company websites did not have

a privacy disclaimer or the option the opt in or opt out when requesting an online insurance quote.

Future research using a longer time frame, inclusion of all cellphone providers and additional control groups would be necessary to monitor the flow of personal information and compliance with the direct marketing requirements of PoPI in the insurance industry, and in other industries, in South Africa. Additional value will be added if the experiment is repeated once PoPI is in effect.

11. REFERENCES

[1] L.F Chen. and R. Ismail, “Information Technology program students’ awareness and perceptions towards personal data protection and privacy”, Proceedings: 3rd International Conference on Research and Innovation in Information Systems, ICRIIS, pp. 434–438, 2013.

[2] C. Doyle and M. Bagaric, “The right to privacy: appealing, but flawed”, The International Journal of Human Rights, Vol. 9 No. 1, pp. 3–36, 2005

[3] V. Hiranandani, “Privacy and security in the digital age: contemporary challenges and future directions”, The International Journal of Human Rights, Vol. 15 No. 7, pp. 1091–1106, 2011.

[4] B. Van der Sloot, “Do privacy and data protection rules apply to legal persons and should they? A proposal for a two-tiered system”, Computer Law & Security Review, Vol. 31 No. 1, pp. 26–45, 2015.

[5] E. Goodman, “Design and ethics in the era of big data”, Interactions, Vol. 21 No. 3, pp. 22–24, 2014.

[6] B. Borena, F. Belanger and D. Ejigu, “Information Privacy Protection Practices in Africa: A Review Through the Lens of Critical Social Theory”, Proceedings: 48th Hawaii International Conference on System Sciences Information, pp. 3490–3497, 2015.

[7] M. De Bruyn, “The Protection of Personal Information Act and Its Impact on Freedom of Information”, International Business & Economics Research Journal, Vol. 13 No. 6, pp. 1315–1340, 2014.

[8] D. Milo and O. Ampofo-anti, “A not so private world”, Without Prejudice, Vol. 14 No. 09, pp. 30–32, 2013.

[9] P. Prinsloo, E. Archer, G. Barnes, Y. Chetty and D. van Zyl, “Big(ger) data as better data in open distance learning”, International Review of Research in Open and Distance Learning, Vol. 16 No. 1, pp. 284–306, 2015.

[10] PricewaterhouseCoopers (PwC), “The protection of personal information bill: The journey to implementation”, [Online]. Available: https://www.pwc.co.za/en/assets/pdf/popi-white-paper-2011.pdf (Accessed 24 February 2016), 2011.

[11] G. Greenleaf, “Sheherezade and the 101 Data Privacy Laws: Origins, Significance and Global

Vol.108 (2) June 2017SOUTH AFRICAN INSTITUTE OF ELECTRICAL ENGINEERS68

Trajectories”, Journal of Law, Information & Science, Vol. 23 No. 1, pp. 1–48, 2014.

[12] H.N. Olinger, J.J. Britz, and M.S. Olivier, “Western privacy and/or Ubuntu? Some critical comments on the influences in the forthcoming data privacy bill in South Africa”, International Information and Library Review, Vol. 39 No. 1, pp. 31–43, 2007.

[13] Directive 95/46/EC of the European Parliament and of the Council of 1995. [Online]. Available: http://ec.europa.eu/justice/policies/privacy/docs/95-46-ce/dir1995-46_part1_en.pdf, 1995.

[14] General Data Protection Regulation (GDPR) of 2012. [Online]. Available: http://eur-lex.europa.eu/legal-content/EN/TXT/PDF/?uri= CELEX: 52012PC0011&from=EN, 2012.

[15] Hunton and Williams, “The proposed EU General Data Protection Regulation: A guide for in-house lawyers”, [Online]. Available: https://www.huntonregulationtracker.com/files/Uploads/ Documents/EU%20Data%20Protection%20Reg%20Tracker/Hunton_Guide_to_the_EU_General_Data_Protection_Regulation.pdf , 2015.

[16] Y. Diaz-Tellez, E.L. Bodanese, S.K. Nair and T. Dimitrakos, “An architecture for the enforcement of privacy and security requirements in internet-centric services”, Proceedings: 11th IEEE International Conference on Trust, Security and Privacy in Computing and Communications, TrustCom-2012 – 11th IEEE International Conference on Ubiquitous Computing and Communications (IUCC-2012), pp. 1024–1031, 2012.

[17] Protection of Personal Information Act (PoPI) 4 of 2013, Vol. 581, No. 37067, Act No. 4 of 2013. Cape Town, South Africa, [Online]. Available: http://www.acts.co.za/ consumer-protection-act-2008/index.html, 2013.

[18] Promotion of Access to Information Act (PAIA) 2 of 2000, South African Government, [Online]. Available: http://www.acts.co.za/ promotion-of-access-to-information-act-2000/index.html, 2000.

[19] The Presidency, President Zuma appoints Chairperson and members of the Information Regulator, [Online]. Available: http://www.thepresidency.gov.za/pebble.asp?relid=22940, 2016.

[20] L. Pillay, “The partial commencement of the Protection of Personal Information Act , 2013”, Without Prejudice, Vol. 14 No. 8, p. 54, 2014.

[21] S. Wilson, “Big data held to privacy laws, too”, Correspondence, Macmillan Publishers Limited., Vol. 519, p. 414, 2015.

[22] B.N. Magolego, “Personal data on the Internet – can POPI protect you?”, De Rebus, No. 548, pp. 20–22, 2014.

[23] Consumers Protection Act (CPA), 68 of 2008. South African Government, [Online]. Available:

http://www.acts.co.za/consumer-protection-act-2008/, 2015.

[24] M. Calaguas, “South African Parliament Enacts Comprehensive Data Protection Law: An Overview of the Protection of Personal Information Bill”, Africa Law Today, No. 3, pp. 1–6, 2013.

[25] I.P. Swart, M.M. Grobler, and B. Irwin: “Visualization of a data leak”, Proceedings: 21st Conference on the Domestic Use of Energy, pp. 1–8, 2013.

[26] J.G. Botha, M.M. Eloff and I. Swart, “The effects of the PoPI Act on small and medium enterprises in South Africa”, Proceedings: Information Security for South Africa (ISSA2015), pp. 1–8, 2015.

[27] H.G. Miller and P. Mork, “From data to decisions: a value chain for big data”, IT Professional, Vol. 15 No. 1, pp.57–59, 2013.

[28] European Commission, A European strategy on the data value chain. Retrieved from https://ec.europa.eu/digital-agenda/en/news/elements-data-value-chain-strategy, 2013.

[29] B. Hamann and S. Papadopoulos, “Direct marketing and spam via electronic communications: An analysis of the regulatory framework in South Africa”, De Jure, Vol. 47 No. 1, pp. 42–62, 2013.

[30] S. Dolnicar and Y. Jordaan, “A Market-Oriented Approach to Responsibly Managing Information Privacy Concerns in Direct Marketing”, Journal of Advertising, Vol. 36 No. 2, pp. 123–149, 2007.

[31] Y.L. Lai and K.L. Hui, “Internet Opt-In and Opt-Out: Investigating the Roles of Frames, Defaults and Privacy Concerns”, Proceedings: 2006 ACM SIGMIS CPR Conference on Computer Personnel Research, pp. 253–263, 2006.

[32] S. Bellman, E.J. Johnson and G.L. Lohse, “On site: to opt-in or opt-out? It depends on the question”, Communications of the ACM, Vol. 44 No. 2, pp. 25–27, 2001.

[33] D. Millard, “Hello, POPI? On cold calling, financial intermediaries and advisors and the Protection of Personal Information Bill”, Journal of Contemporary Roman-Dutch Law, Vol. 76, pp. 604–622, 2013.

[34] Financial Advisory and Intermediary Services (FIAS) Act, 2002 (Act No. 37 of 2002). [Online]. Available: http://www.acts.co.za/ financial-advisory-and-intermediary-services-act-2002/, 2002.

[35] S. Widup, G. Bassett, D. Hylender, B. Rudis, M. Spitler, “2015 Protected Health Information Data Breach Report”, [Online]. Available: http://www.verizonenterprise.com/resources/reports/rp_2015 -protected-health-information-data-breach-report_en_xg.pdf, 2015.

[36] PricewaterhouseCoopers (PwC), “Turnaround and transformation in cybersecurity”, [Online]. Available: https://www.pwc.com/gx/en/ consulting-services/information-security-survey/assets/pwc-gsiss-2016-financial-services.pdf, 2015.

Vol.108 (2) June 2017 SOUTH AFRICAN INSTITUTE OF ELECTRICAL ENGINEERS 69

[37] CIBECS, “2012 State of business data protection in South Africa”, [Online]. Available: http://offers.cibecs.com/state-of-business-data-protection-in-sa, pp.14, 2012.

[38] J.D. Brewer, “The A-Z of Social Research Positivism”, SAGE Research Methods, pp. 236–238, 2015.

[39] D. Cohen and B. Crabtree, The Positivist Paradigm, [Online]. Available: http://www.qualres.org/HomePosi-3515.html, 2008.

[40] K. Staller, “Encyclopedia of Research Design”, Encyclopedia of Research Design: Qualitative Research, pp. 1159-1164, 2010.

[41] R.L. Miller and J.D. Brewer, “The A-Z of Social Research Research design”, SAGE Research Methods, pp. 263–269, 2003.

[42] H.J. Seltman, “Experimental Design and Analysis”, p. 35., 2013.

[43] Regulation of Interception of Communication and Provision of Communication-related Information Act (RICA), Act 70 of 2002, South African Government, [Online]. Available: http://www.acts.co.za/regulation-of-interception-of-communications-and-provision-of-communication-related-information-act-2002/, 2002.

[44] P. Prinsloo, E. Archer, G. Barnes, Y. Chetty and D. Van Zyl, "Big(ger) data as better data in open distance learning", International Review of Research in Open and Distance, Learning, Vol. 16 No. 1, pp. 284–306, 2015.

[45] Europa, Factsheet EU-US Privacy Shield, [Online]. Available: http://ec.europa.eu/justice/data-protection/files/factsheets/factsheet_eu-us_privacy_shield_en.pdf.

[46] M. Madden and L. Rainie, Internet Science and Technology Research, Pew Research Centre 204, Privacy Perceptions, [Online]. Available: http://www.pewinternet.org/2014/11/12/privacy-perceptions/, 2014.

[47] D. Banister, Social Science Research Network, National Comprehensive Data Protection/Privacy Laws and Bills 2016 Map, http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1951416, (Accessed 4 October 2016), 2016.

[48] U. Akpojivi and A. Bevan-Dye, “Mobile advertisement and information privacy perception amongst South African generation Y students”, Telematics and Informatics, Vol. 32 No 2015, pp. 1–10, 2015.

[49] G. Sterling, Survey: 99 percent of consumers will share personal info for reward, but want brans to ask permission – Global Microsoft survey offers findings about attitudes towards data sharing, Marketing Land, [Online]. Available: http://marketingland.com/survey-99-percent-of-consumers-will-share-personal-info-for-rewards-also-want-brands-to-ask-permission-130786, 2015.

[50] M.R. De Villiers, “Models for interpretive information systems research, part 1: IS research, action research, grounded theory – a meta – study and examples,” in Research Methodologies, Innovations and Philosophies in Software Systems Engineering and Information Systems, M. Mora, O. Gelman, A. Steenkamp, and M. S. Raisinghani, Eds. Hershey: IGI Global, pp. 222-237, 2012.

[51] N. Nadasen, C. Pilkington, and A. Da Veiga, “Personal information value chains in the South African insurance industry – an experiment”, Proceedings: CONF-IRM 2016 Proceedings International Conference on Information Resources Management (CONF-IRM), paper 28, May 2016.

[52] APEC, Asia Pacific Economic Cooperation. Asia Pacific Economic Cooperation (APEC) privacy framework, [Online]. Available: http://publications.apec.org/publication-detail.php?pub_id=390,2005.

[53] Organisation of Economic Organisation and Development (OECD), OECD privacy principles, [Online]. Available: https://www.oecd.org/sti/ieconomy/oecd_privacy_framework.pdf, 2013.

[54] British Standard BS 10012:2009, Data protection – Specification for a personal information management system, BSI, [Online]. Available: http://shop.bsigroup.com/ProductDetail/?pid=000000000030175849, 2009.

[55] ISO IEC 29100.2011 Information technology — Security techniques — Privacy framework, [Online]. Available: http://www.iso.org/iso/iso_catalogue/catalogue_tc/catalogue_detail.htm?csnumber=45123, 2011.

[56] Fair and Accurate Credit Transactions Act (FACTA), Public Law 108-159-DEC.4, [Online]. Available: https://www.congress.gov/108/plaws/publ159/PLAW-108publ159.pdf, 2003.

[57] Health Insurance Portability and Accountability Act of 1996 (HIPAA), Public Law 104-191, [Online]. Available: https://www.gpo.gov/fdsys/pkg/PLAW-104publ191/pdf/PLAW-104publ191.pdf, 1996.

[58] Y. Jordaan, “Information privacy concerns of different South African socio-demographic groups”, Southern African Business Review, Vol. 11 No. 2, pp. 19-38, 2007.

[59] Constitution of the Republic of South Africa, [Online]. Available: http://www.gov.za/documents/ constitution/constitution-republic-south-africa-1996-1, 1996.

[60] Fair Information Practice Principles, IT Law Wikia, [Online]. Available: http://itlaw.wikia.com/wiki/Fair_Information_Practice_Principles

[61] C.P. Pfleeger, S.L. Pfleeger and J. Margulies, Security in Computing, Prentice-Hall Inc., USA, fifth edition, chapter 4, pp.274, 2015.

Vol.108 (2) June 2017SOUTH AFRICAN INSTITUTE OF ELECTRICAL ENGINEERS70

[62] Direct Marketing Association of SA, [Online]. Available: https://www.nationaloptout.co.za/, 2016

[63] Norton Rose Fulbright, PoPI and Insurance, [Online]. Available: http://www.nortonrosefulbright.com/knowledge/publications/74156/popi-and-insurance, 2013.

[64] G. Payne and J Payne, “Key concepts in social research”, Experiments, Sage Publications, pp 85-88, 2016.

Vol.108 (2) June 2017 SOUTH AFRICAN INSTITUTE OF ELECTRICAL ENGINEERS 71

SPECIFIC EMITTER IDENTIFICATION FOR ENHANCED ACCESSCONTROL SECURITY

J.N. Samuel and W.P. du Plessis∗

∗ Department of Electrical, Electronic and Computer Engineering, University of Pretoria, SouthAfrica. Email: [email protected], [email protected]

Abstract: The application of specific emitter identification (SEI) to access control usingradio-frequency (RF) access remotes is presented. Existing RF access remotes are vulnerable to anumber of attacks including replay attacks due to their reliance on digital codes. SEI can overcomemany vulnerabilities by exploiting the effect of hardware tolerances on the analogue signals transmittedby access remotes. A proof-of-concept SEI system was developed to investigate whether it is possible todistinguish between the RF signals produced by nominally-identical access remotes. It was determinedthat it is possible to distinguish between the access remotes with an accuracy of 98% with no falsepositives, even when tested against unknown remotes with the correct digital code and replay attacks.

Key words: Specific emitter identification (SEI), access control, and software-defined radio (SDR).

1. INTRODUCTION

Radio-frequency (RF) access remotes such as the oneshown in Fig. 1 are used to open gates to residential estates,and doors to houses and garages. On this basis they providesecurity as only people having an access remote with thecorrect code are able to gain access to these areas, akinto a key. However, the digital signal produced by theseaccess remotes can easily be determined using low-cost RFreceivers and reproduced by an RF transmitter [1–3]. Thisprocess allows illegitimate access to residential estates,houses and garages. This observation motivates the needfor making the systems that receive signals from RF accessremotes more robust to access remotes being cloned. Thispaper demonstrates how conventional RF access remotescan be uniquely identified using low-cost software-definedradio (SDR) receivers and specific emitter identification(SEI), thereby increasing security.

The fact that the coded signal transmitted by accessremotes is digital makes it extremely simple to clone astatic code [1], and in fact, many types of access remoteare programmed by cloning an existing signal [4]. Inan attempt to minimise this problem, codes which varyeach time an access remote is used are employed in neweraccess remotes, but even access remotes using such rollingcodes are subject to attack by cloning the digital signal[2, 3]. The problem with digital codes is inherently thatthey are simple to intercept and reproduce.

By comparison, SEI, also known as radio-frequencyfingerprinting (RFF) or physical-layer identification, is atechnique used to uniquely identify RF transmitters, even

This work is based on the research supported in part by the NationalResearch Foundation of South Africa (NRF) (Grant specific uniquereference number (UID) 85845). The NRF Grantholder acknowledgesthat opinions, findings and conclusions or recommendations expressed inany publication generated by the NRF supported research are that of theauthor(s), and that the NRF accepts no liability whatsoever in this regard.

Figure 1: A typical RF access remote.

those of the same make and model, using the analoguecharacteristics of their transmitted RF signals [5]. Thismeans of identification is possible due to the hardwaretolerances of the RF circuitry having unique, measurableeffects on the analogue signal without affecting the digitaldata being transmitted [6]. SEI is thus able to alleviate themimicing or spoofing of the identities of RF devices as theanalogue identifying characteristics exploited by SEI areinherently difficult to spoof [7, 8]. In this way, SEI canbe used to enhance the security of access-control systemsusing RF access remotes.

A proof-of-concept SEI system for access control usingRF access remotes is described. This system is ableto distinguish access remotes with an accuracy of 98%and correctly rejects all unknown signals presented toit. Only 1.2 s of training data is required per remotewhich should be granted access, and access control isaccomplished with single 23.5-ms bursts. The system wastested against a number of challenging attacks includingunknown remotes with the correct digital code and replayattacks. Only the receiver system has to be changed,leaving the access remotes unmodified, thereby removingthe expense associated with more complex remotes. Thesuccess of this demonstration suggests that this is a viableapproach to increasing the security which can be achievedusing even the most basic conventional RF access remotes.

Based on: “Specific Emitter Identification for Enhanced Access Control”, by J.N. Samuel and W.P. du Plessis which appeared in the Proceedings of Information Security South African (ISSA) 2016, Johannesburg, 17 & 18 August 2016. © 2016 IEEE

Vol.108 (2) June 2017SOUTH AFRICAN INSTITUTE OF ELECTRICAL ENGINEERS72

Feature extraction

Classifier

Signal processing

Acquisition

emitter label

radio signal

digitised radio signal

processed radio signal

features

Figure 2: SEI system overview.

Section 2 presents the design and implementation ofa proof-of-concept software system that performs SEIto distinguish between two nominally-identical accessremotes. Section 3 describes the results obtained from thestudy. Section 4 concludes the paper.

2. SYSTEM DESCRIPTION

The overall SEI system depicted in Fig. 2 consists of theelements which will are considered below.

1. The acquisition system acquires the RF signalsproduced by the access remotes. It then stores thedata in a digital format for later processing.

2. Signal processing is then performed on the storedRF signals to remove any arbitrary variances in thesignals that may distort the signals and affect signalclassification.

3. The feature-extraction subsystem then extracts dis-tinct features from the processed RF signals.

4. The classifier subsystem then takes the extractedfeatures and builds an association between the RFsignals and the transmitters from which they wereproduced. It then uses this association to classify andidentify RF bursts produced by an access remote.

2.1 Operating Characteristics of RF Access Remotes

The RF access remotes considered in this study operate inthe portion of the industrial, scientific and medical (ISM)band at 403.55 MHz [4]. This band is intended for theoperation of equipment designed to use local RF energyfor purposes other than telecommunications [9].

0

0.2

0.4

0.6

0.8

1

1.2

-10 0 10 20 30 40 50

No

rmal

ised

am

pli

tud

e

Time (ms)

Figure 3: Captured signal from an access remote.

Figure 4: A HackRF One SDR.

These access remotes transmit a modulated sequence ofbits to the receiver in order to open or close the gate.Fig. 3 shows a portion of a signal received from an accessremote. The modulation takes the form of pulse widthmodulation (PWM) in which a very long pulse denotesthe start of a burst, followed shorter pulses of differinglengths which denote the code bits [1]. The code of thesignal in Fig. 3 would be 100000000001 if wide and shortdata pulses correspond to ones and zeros respectively. Thissimple modulation makes these access remotes susceptibleto replay attacks allowing illegitimate access to residentialestates, houses and garages.

For the development of this system, eleven RF accessremotes with the same digital access code were considered(Remotes 1 to 11). Remote 1 was also tested using adifferent digital code (Remote 12). To test the effect ofreplay attacks, the signals of Remotes 1 to 3 were recordedand replayed to the receiver system using a HackRF OneSDR [10], shown in Fig. 4, with a sampling rate of 10 Mspsto test whether cloning can be countered (Remotes 13 to15).

2.2 Signal Acquisition

The signal acquisition system consists of two processes,namely the recording process and burst-extraction process.

For signal recording, an RTL2832U SDR with an R820Ttuner, shown in Fig. 5, was utilised with the low-level free

Vol.108 (2) June 2017 SOUTH AFRICAN INSTITUTE OF ELECTRICAL ENGINEERS 73

Figure 5: An RTL2832U SDR receiver.

and open-source software (FOSS) drivers available online[11]. The selected SDR that performs quadrature samplingat up to 2.56 Msps without missing samples and has 8-bitanalogue-to-digital converter (ADC) resolution [12] and isrelatively inexpensive with prices ranging from $15 to $25[13]. The SDR receiver was configured to have a centrefrequency of 403.5 MHz and a sampling rate of 1 Msps.

Five 1-s recordings of each of the fifteen access remoteswere made, and the recorded samples were stored in abinary files for later processing.

2.3 Signal processing

The recorded signals were further processed in order toremove any arbitrary variances in the bursts that are dueto noise, amplitude variances and frequency offsets.

The spectrum of a signal recorded from an access remoteis shown in Fig. 6(a), where noise, spurious signals and afrequency offset of approximately 75 kHz can be seen. Thefirst step taken in processing was thus to mix the signal toa centre frequency of 0 Hz to ensure that all the signals arewithin the passband of the filter applied in the next step.The next step was to reduce the noise through filtering. A40-coefficient finite impulse response (FIR) filter with aBlackman window was used due to its low sidelobes [14].The spectrum after frequency correction and filtering isshown in Fig. 6(b).

Following filtering, the amplitude representation of eachburst was normalised to make the median of the high signallevel 1 as shown in Fig. 7. This normalisation preventsthe feature-extraction subsystem from producing featurevectors that differ due to amplitude variances betweenbursts. This would cause the misclassification of burstseven if they were produced from the same access remote.

The individual bursts are all identical for a given code, sothe next step was to extract the bursts from the stored RFsignals produced by each access remote. A threshold of0.5 was utilised to determine the positions of the risingedges of each pulse. The rising edges were used ratherthan the falling edges as the positions of the falling edgesvary depending on the values of the data bits.

The durations of the various portions of a burst are shown

-80

-70

-60

-50

-40

-30

-20

-10

0

-0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5

Norm

alis

ed a

mpli

tude

(dB

)

Offset frequency (MHz)

(a) Before processing.

-80

-70

-60

-50

-40

-30

-20

-10

0

-0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5

Norm

alis

ed a

mpli

tude

(dB

)

Offset frequency (MHz)

(b) After frequency correction and filtering.

Figure 6: The magnitude of the frequency spectrum.

0

0.2

0.4

0.6

0.8

1

1.2

0 5 10 15 20 25

Norm

alis

ed a

mpli

tude

Time (ms)

Figure 7: The amplitude of an access-remote burst.

in Table 1 where a period is the time from one rising edgeto the next, while a width is the time from a rising edgeto the next falling edge. Despite the large variations inthe analogue parameter values, the same digital code istransmitted. A start pulse was identified as having a periodof over 10.5 ms. A burst was then taken as starting 100samples before the rising edge of the current start pulseand ending 100 samples before the rising edge of the nextstart pulse. The offset of 100 samples ensured that therising edge of the start pulse is included in the burst whichcontains the start pulse as shown in Fig. 7.

Vol.108 (2) June 2017SOUTH AFRICAN INSTITUTE OF ELECTRICAL ENGINEERS74

Table 1: Burst subsection durations.Description Minimum Median MaximumBurst length 23.25 ms 23.48 ms 25.72 msStart period 10.73 ms 10.82 ms 12.86 msStart pulse 10.40 ms 10.48 ms 12.48 msBit period 1 033 µs 1 047 µs 1 122 µsLong bit 678 µs 701 µs 777 µsShort bit 334 µs 358 µs 432 µs

-4

-3

-2

-1

0

1

2

3

4

0 5 10 15 20 25

Ph

ase

(ra

dia

ns)

Time (ms)

(a) All data.

60

80

100

120

140

160

0 5 10 15 20 25

Phase

(ra

dia

ns)

Time (ms)

(b) Transmitter off data discarded and phase unwrapped.

Figure 8: The phase of an access-remote burst.

It must be noted that the phase sections correspondingto parts of the burst where the access remote is nottransmitting are random, as seen in Fig. 8(a), andwill negatively affect the classification. As a result,an amplitude threshold is applied to remove samplesthat correspond to data when the access remote is nottransmitting. The phase representation that results fromapplying a threshold of 0.5 to the signal amplitude andunwrapping the phase is shown in Fig. 8(b). As seenin Fig. 8(b), the random fluctuations due the portionsof the burst where the access remote is not transmittinghave been removed, leaving only those signal artifacts thatcorrespond to the access remote’s hardware tolerances.The signal samples that do not correspond to activetransmission of the access remote are thus discarded priorto feature extraction for both amplitude and phase.

-40

-20

0

20

40

60

80

100

0 5 10 15 20 25

Fre

quen

cy (

kH

z)

Time (ms)

Figure 9: The frequency of an access-remote burst.

A further problem with using phase information is thatphases are subject to arbitrary offsets. This characteristicis clearly seen in Fig. 8(b), where each segment ofphase information has a different initial phase value.These arbitrary phase offsets carry no information aboutthe transmitter properties but can still have a significantinfluence on the output of a classifier. Frequencyinherently provides the same information as phase asfrequency is simply the gradient of phase. However,frequency is not subject to arbitrary offsets as can be seenby comparing the phase of a burst in Fig. 8(b) to thefrequency of the same burst in Fig. 9. As a result, thefrequency of the signal was used rather than the phase.

Table 2 lists the number of bursts extracted for eachremote. In all cases, the first four 1-s recordings were usedas training data, with the final 1-s recording being used astest data (see Section 2.6). Remote 4 has fewer bursts thanthe other remotes as its bursts are significantly longer thanthose of the other remotes (> 25.7 ms versus a maximumof < 23.6 ms), so it transmits fewer bursts during the 1-srecordings.

2.4 Signal difference inspection

Once signal processing is complete, the true differencesbetween the signals produced by each access remote bedetermined. For SEI to be successful, it is imperativethat the characteristics of the signal produced by a specifictransmitter be consistent for all signals produced by thattransmitter, while being appreciably distinct from thecharacteristics produced by another transmitter.

The amplitude and phase representations of the first burstsproduced by Remote 1 on consecutive recordings areshown in Fig. 10. While there are differences between thetwo recordings, the overall responses display remarkablesimilarities. While not shown, the responses of all remotesdisplay similarly consistent results, thereby satisfying thefirst condition for SEI to be successful.

The amplitude and phase representations of the firstbursts produced by Remotes 1 and 2 are compared in

Vol.108 (2) June 2017 SOUTH AFRICAN INSTITUTE OF ELECTRICAL ENGINEERS 75

Table 2: The number of bursts extracted from the recorded data for each remote.Remote number

Description 1 2 3 4 5 6 7 8 9 10 11 12† 13‡ 14‡ 15‡

Training 164 164 165 149 166 164 164 164 164 164 164 164 164 164 164Testing 41 41 41 37 41 41 42 41 41 41 42 41 41 41 41† Remote 12 is Remote 1 with a different code.‡ Remotes 13 to 15 are the replay attacks of Remotes 1 to 3.

0

0.2

0.4

0.6

0.8

1

1.2

0 5 10 15 20 25

Norm

alis

ed a

mpli

tude

Time (ms)

Recording 1 Recording 2

(a) Amplitude. The two curves are almost identical.

-40

-20

0

20

40

60

80

100

0 5 10 15 20 25

Fre

quen

cy (

kH

z)

Time (ms)

Recording 1Recording 2

(b) Frequency. The two curves are almost identical.

Figure 10: Comparison between two bursts for Remote 1.

Fig. 11. The amplitude responses of the two remotesin Fig. 11(a) display only minor differences, which isanticipated as the modulation used depends on signalamplitude. Based on this observation, the amplituderepresentations of the access remotes are unlikely toachieve the ultimate goal of classifying the bursts emittedby the access remotes. Observing the differences betweenthe phase representations in Fig. 11(b), it is seen thatthe phase representations for each access remote differsignificantly. These phase differences are more distinctthan the differences seen in the amplitude representation.While not shown, the amplitude differences of some of theother remotes differ more significantly than those shown inFig. 11(a), and the phase differences between all remotesare significant. On this basis, the phase representationsof the access remotes are expected to be better for thepurposes of SEI, and the second criterion for successfulSEI is thus fulfilled.

0

0.2

0.4

0.6

0.8

1

1.2

0 5 10 15 20 25

Norm

alis

ed a

mpli

tude

Time (ms)

Remote 1 Remote 2

(a) Amplitude. The two curves are very similar.

-60

-40

-20

0

20

40

60

80

100

0 5 10 15 20 25

Fre

quen

cy (

kH

z)

Time (ms)

Remote 1Remote 2

(b) Frequency

Figure 11: Comparison between bursts for two remotes.

2.5 Feature Extraction

While it is possible to present the entire amplitude or phaserepresentation to the classifier, this would be inefficient andmay hinder classification accuracy. This is because eachsample in the phase and amplitude representations wouldbe treated as a feature leading to an exorbitant number offeatures, and each remote would have a different numberof features due to the varying burst lengths. Instead, a setof values that effectively summarises the shape of eachrepresentation and which ensures that all bursts have thesame number of values is calculated. These values thenserve as the features for each signal representation, and theprocess is called feature extraction [15].

Statistical measures are typically used in the SEI ofwireless devices such as Global System for MobileCommunications (GSM) cellular telephones [16]. For the

Vol.108 (2) June 2017SOUTH AFRICAN INSTITUTE OF ELECTRICAL ENGINEERS76

Table 3: Confusion matrix with all remotes and all available training bursts considered during training.1 2 3 4 5 6 7 8 9 10 11 12† 13‡ 14‡ 15‡ U� Correct

1 41 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 100%2 0 29 0 0 0 0 0 0 0 0 0 0 0 0 0 12 71%3 0 0 40 0 0 0 0 0 0 0 0 0 0 0 0 1 98%4 0 0 0 37 0 0 0 0 0 0 0 0 0 0 0 0 100%5 0 0 0 0 41 0 0 0 0 0 0 0 0 0 0 0 100%6 0 0 0 0 0 41 0 0 0 0 0 0 0 0 0 0 100%7 0 0 0 0 0 0 42 0 0 0 0 0 0 0 0 0 100%8 0 0 0 0 0 0 0 41 0 0 0 0 0 0 0 0 100%9 0 0 0 0 0 0 0 0 41 0 0 0 0 0 0 0 100%10 0 0 0 0 0 0 0 0 0 41 0 0 0 0 0 0 100%11 0 0 0 0 0 0 0 0 0 0 42 0 0 0 0 0 100%12† 0 0 0 0 0 0 0 0 0 0 0 41 0 0 0 0 100%13‡ 0 0 0 0 0 0 0 0 0 0 0 0 41 0 0 0 100%14‡ 0 0 0 0 0 0 0 0 0 0 0 0 0 41 0 0 100%15‡ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 41 0 100%† Remote 12 is Remote 1 with a different code.‡ Remotes 13 to 15 are replay attacks of Remotes 1 to 3.� Unknown (confidence level too low).

development of this system, statistical feature extractionwas utilised. Each burst was divided into five equally sizedsub-regions [16], with the mean, variance, skewness andkurtosis were calculated for both the amplitude and phasebeing computed for each sub-region. This led to a total of40 features per signal representation (5 regions, each with4 statistical measures for both amplitude and phase), whichtogether represent a single feature vector.

2.6 Signal Classification

Once a set of feature vectors have been established,classification can take place. In order to performclassification, the feature vectors have to be segmented intotraining and test sets for each access remote. Once this isdone, a classifier should be selected, trained and evaluated.

The training feature vectors serve to build an associationbetween the feature vectors and the access remotes fromwhich they were derived. This is done by presenting theclassifier with a feature vector and an associated accessremote label for all feature vectors in the training group.The test group of feature vectors is then used to evaluatethe performance of the classifier. In this phase, each featurevector in the test group is presented to the classifier withouta label, and the classifier returns the label of the accessremote it deems most likely to correspond to the featurevector [17]. It is important to note that the training and testgroups of feature vectors must be derived from differentbursts. For the development of this system, training featurevectors were derived from the first four recordings of eachaccess remote, while test feature vectors were derived fromthe fifth recording of each access remote.

As the object of this work was not to study classifiers,an off-the-shelf classifier from the Octave nan toolboxwas used [18]. This toolbox implements a large numberof classifiers, and of these, the naive Bayes classifierwas selected. This classifier produces a distance metric(a form of confidence) for each of the known classes(remotes), with the class with the largest distance metricbeing returned as the result. The distance metric is muchgreater than 1 for high confidence and far smaller than 1for low confidence, so unknown classes can be includedin the results by adding an additional unknown class witha distance metric of 1. When a class is classified withhigh confidence, the distance metric is � 1, and the resultwill be the relevant class. However, low confidence inthe classification will lead to a distance metric which is� 1, so the result will be the unknown class. In this way,unknown remotes and cases where there is low confidencein the classification can be flagged by the system.

3. RESULTS

The results obtained using the system described above willbe considered below. The first question is whether remotescan be correctly identified using the proposed SEI system,and the more important question is whether known andunknown remotes are correctly distinguished. Initially, theuse of all the available training bursts will be considered,but later results will evaluate the effect of using fewerbursts for training.

Table 3 shows the confusion matrix which resulted whenall of the available remotes were considered and all of theavailable training bursts were utilised for training. The

Vol.108 (2) June 2017 SOUTH AFRICAN INSTITUTE OF ELECTRICAL ENGINEERS 77

Table 4: Confusion matrix with five unknown remotes.1 2 3 4 5 6 7 8 9 10 U� Correct

1 41 0 0 0 0 0 0 0 0 0 0 100%2 0 29 0 0 0 0 0 0 0 0 12 71%3 0 0 40 0 0 0 0 0 0 0 1 98%4 0 0 0 37 0 0 0 0 0 0 0 100%5 0 0 0 0 41 0 0 0 0 0 0 100%6 0 0 0 0 0 41 0 0 0 0 0 100%7 0 0 0 0 0 0 42 0 0 0 0 100%8 0 0 0 0 0 0 0 41 0 0 0 100%9 0 0 0 0 0 0 0 0 41 0 0 100%

10 0 0 0 0 0 0 0 0 0 41 0 100%11-15∗ 0 0 0 0 0 0 0 0 0 0 206 100%� Unknown (confidence level too low).∗ Not considered during training.

Table 5: Confusion matrix with ten unknown remotes.1 2 3 4 5 U� Correct

1 41 0 0 0 0 0 100%2 0 29 0 0 0 12 71%3 0 0 40 0 0 1 98%4 0 0 0 37 0 0 100%5 0 0 0 0 41 0 100%

6 -15∗ 0 0 0 0 0 412 100%� Unknown (confidence level too low).∗ Not considered during training.

most important observation from Table 3 is that the systemdid not confuse any of the remotes as all results are eithera correct classification or a uncertain outcome.

Examining the results more closely shows that onlyRemotes 2 and 3 produce uncertain results with all theother remotes being correctly classified in all cases. Buteven in these two cases, over 70% of the bursts transmittedby a specific remote were still correctly identified.

These results are encouraging because 98% of the testbursts were correctly identified, and more than 70% ofthe test bursts of any remote were correctly identified.Perhaps more significantly, the use of a known remote witha different code (Remote 12) and the inclusion of replayattacks (Remotes 13 to 15) did not affect the classificationof the relevant remotes (Remotes 1 to 3) suggesting thatthe system is able to distinguish between different codesand between the original and recorded versions of the sameremote.

A more important test from the perspective of accesscontrol is whether known and unknown remotes arecorrectly distinguished. Tables 4 and 5 show theconfusion matrices which resulted when Remotes 10 to15 and Remotes 6 to 15 were excluded from the trainingdata respectively, and all available training bursts were

used. The unknown remotes comprised unknown remotes(Remote 6 to 11), one known remote with a different code(Remote 12) and three replay attacks (Remotes 13 to 15).These unknown remotes thus tested all the major potentialvulnerabilities of the system.

Remarkably, Tables 4 and 5 show that the unknownremotes were correctly rejected in all cases (there wereno false positives), despite the fact that all but one ofthe remotes transmit the same digital code. Furthermore,all the known remotes were correctly classified, with theexception that 13 bursts transmitted by known remoteswere incorrectly classified as unknown remotes (3% and6% false negatives in Tables 4 and 5 respectively). Of thefalse negatives, 12 (92%) were generated by Remote 2, andeven then, 71% of the bursts from Remote 2 were correctlyclassified.

The system is thus capable of correctly distinguishingbetween known and unknown remotes on the basis of theanalysis of single bursts 98% of the time. This excellentperformance is achieved despite the unknown remotesincluding a number of remotes with the same code, aknown remote with a different code, and replay attacks.

Of importance in an access-control scenario, the systemerred on the side of classifying known remotes as unknownrather than vice versa, thereby rather restricting thanallowing access when there was doubt about whether aremote is known or not. Even in the worst case (Remote 2),71% of the transmitted bursts were correctly identified, soaccess will not be unnecessarily withheld. Even assumingthat the 12 bursts which were incorrectly rejected in theworst case (Remote 2) were transmitted one after the other,a delay of less than 330 ms (13 × 23.5 ms = 329 ms) willbe incurred to ensure an accepted burst is received.

The above results all consider the case where all of theavailable training bursts are used. While this means thatonly 4 s of data was required for training, requiring fewertraining bursts would speed the training process. Fig. 12shows the system performance as a function of the numberof bursts per remote used for training.

As observed above, the system erred on the side ofrejecting remotes, so the false negative rates (incorrectlyrejecting a known remote) in Fig. 12 are initially high andthen decrease as the number of training bursts increases.However, this behaviour also means that the false positiverate (incorrectly accepting an unknown remote) is zerothroughout. The number of bursts used for training thusonly affects whether known remotes are granted accesswith unknown remotes always being denied access.

No improvement in the performance of the system isachieved when more than 51 bursts are used for trainingwith ten known remotes (Fig. 12(a)) and after 27 burstsfor five known remotes (Fig. 12(b)). This behaviour isanticipated as identifying ten remotes is more complex

Vol.108 (2) June 2017SOUTH AFRICAN INSTITUTE OF ELECTRICAL ENGINEERS78

0

20

40

60

80

100

0 10 20 30 40 50 60

Pro

port

ion (

%)

Training bursts

True positivesFalse negativesTrue negativesFalse positives

(a) Five unknown remotes.

0

20

40

60

80

100

0 10 20 30 40 50 60

Pro

port

ion (

%)

Training bursts

True positivesFalse negativesTrue negativesFalse positives

(b) Ten unknown remotes.

Figure 12: The performance of the system as a function ofthe number of training bursts.

than identifying five remotes. The one apparent anomalyin Fig. 12 is that the case with ten known remotes correctlyidentifies known remotes more accurately than the casewith five known remotes, which is surprising as identifyinga greater number of remotes is more complex. However,this apparent anomaly is explained by noting that the 13false negative results for Remotes 2 and 3 form a greaterportion of the total number of test bursts for known remoteswhen fewer known remotes are considered. Significantly,the results for systems trained with 51 and 27 bursts areidentical to those trained with all the available trainingbursts, so over-fitting does not appear to be a problem inthis system.

The system is thus capable of being successfully trainedwith comparatively short recordings. When ten knownremotes are considered, only 1.2 s of data are requiredto obtain the 51 bursts necessary for optimum training(51 × 23.5 ms = 1.2 s). The time necessary to train thesystem is thus not prohibitive.

But more importantly, the number of training bursts doeschange the fact that unknown remotes are always rejectedby the system. This characteristic is extremely importantin access-control systems where unauthorised access mustbe prohibited.

4. CONCLUSION

In conclusion, the development of a proof-of-conceptSEI access control system for RF access remotes provedsuccessful.

Offline classification was performed on RF burstsproduced by access remotes using recordings obtainedwith a low-cost SDR receiver. Individual bursts couldbe identified as belonging to a specific known accessremote or to an unknown access remote with an accuracyof 98%. More significantly, all bursts from unknownaccess remotes were correctly rejected by the system. Thisperformance was achieved despite considering unknownremotes, a known remote with a different code, and replayattacks, so the system is shown to be robust against themain attack classes.

In light of these observations, SEI has been shown tohold tremendous potential to enhance the security ofRF access remotes without changing the access remotesor significantly increasing cost of the receiver. Thisimprovement is achieved by providing physical-layeridentification of the individual access remotes rather thanrelying only on the digital code transmitted.

ACKNOWLEDGEMENT

The authors would like to express their sincere thanks tothe anonymous reviewers for their valuable comments andsuggestions.

REFERENCES

[1] T. Watorowski. (2016, July) H4ck33D – hackinga 433MHz remote control. [Online]. Available:http://mightydevices.com/?p=300

[2] S. Kamkar. (2016, December) Opensesame: hackinggarages in seconds. [Online]. Available: http://samy.pl/opensesame/

[3] A. Nohawk. (2016, December) Bypassingrolling code systems. [Online]. Avail-able: https://andrewmohawk.com/2016/02/05/bypassing-rolling-code-systems/

[4] (2016, December) SENTRY learning 1/3/4button (403MHz) (binary, trinary, french).[Online]. Available: http://www.martin-electronics.co.za/Learning B T F %20403Mhz.aspx

[5] K. I. Talbot, P. R. Duley, and M. H. Hyatt, “Specificemitter identification and verification,” TechnologyReview Journal, vol. 11, pp. 113–133, June 2003.

[6] B. Danev, D. Zanetti, and S. Capkun, “Onphysical-layer identification of wireless devices,”ACM Computing Surveys, vol. 45, no. 1, pp.6:1–6:29, December 2012.

Vol.108 (2) June 2017 SOUTH AFRICAN INSTITUTE OF ELECTRICAL ENGINEERS 79

[7] M. Williams, M. A. Temple, and D. Reising, “Aug-menting bit-level network security using physicallayer RF-DNA fingerprinting,” in Global Telecom-munications Conference (GLOBECOM), Miami,USA, December 2010, pp. 1–6.

[8] D. R. Reising, M. A. Temple, and M. J.Mendenhall, “Improving intra-cellular security usingair monitoring with RF fingerprints,” in IEEEWireless Communications and Networks Conference(WNC10), Sydney, Australia, April 2010.

[9] ITU. (2016, July) Article 1: Terms and definitions.[Online]. Available: http://life.itu.int/radioclub/rr/art01.htm

[10] (2017, January) Great Scott Gadgets – HackRF One.[Online]. Available: https://greatscottgadgets.com/hackrf/

[11] (2016, December) Github – osmocom/rtl-sdr.[Online]. Available: https://github.com/osmocom/rtl-sdr

[12] (2016, July) rtl-sdr – OsmoSDR. [Online]. Available:http://sdr.osmocom.org/trac/wiki/rtl-sdr

[13] (2016, December) NooElec – software defined radio.[Online]. Available: http://www.nooelec.com/store/sdr.html?SID=t9c2n3lqampvb1fvejusq7gbt3&dir=asc&limit=all&order=price

[14] F. J. Harris, “On the use of windows for harmonicanalysis with the discrete Fourier transform,” Proc.IEEE, vol. 66, no. 1, pp. 51–83, January 1978.

[15] R. O. Duda, P. E. Hart, and D. G. Stork,Pattern Classification, 2nd ed. New York, USA:Wiley-Interscience, 2000.

[16] D. R. Reising, M. A. Temple, and M. J.Mendenhall, “Improved wireless security for GMSKbased devices using RF fingerprinting,” InternationalJournal of Electronic Security and Digital Forensics,vol. 3, no. 1, pp. 41–59, March 2010.

[17] S. Russell and P. Norvig, Artificial Intelligence AModern Approach. New Jersey, USA: PearsonEducation, 2010.

[18] (2016, December) The ’nan’ package. [Online].Available: https://octave.sourceforge.io/nan/

Vol.108 (2) June 2017SOUTH AFRICAN INSTITUTE OF ELECTRICAL ENGINEERS80

MANET REACTIVE ROUTING PROTOCOLS NODE MOBILITY VARIATION EFFECT IN ANALYSING THE IMPACT OF BLACK HOLE ATTACK.

E.O. Ochola*, L.F. Mejaele*†, M.M. Eloff** and J.A. van der Poll***

* School of Computing, University of South Africa, P O Box 392, Pretoria, 0003, South Africa.Email: [email protected]† Dept. of Mathematics and Computer Science, National University of Lesotho, Roma, Lesotho.Email: [email protected]** Institute for Corporate Citizenship, University of South Africa, Pretoria, 0003, South Africa.Email: [email protected]*** Graduate School of Business Leadership, University of South Africa, Midrand, South Africa.Email: [email protected]

Abstract: MANETs are exposed to numerous security threats due to their characteristic features,which include absence of centralised control unit, open communication media, infrastructure-less and dynamic topology. One of commonest attack is known as black hole attack, which mostly targets the MANETs reactive routing protocols, such as AODV and DSR. Simulation scenarios of AODV and DSR based MANET were conducted using Network Simulator 2 (NS-2) and NS-3, while introducing the black hole attack in each of the scenarios, to analyse the protocols’ performances. The different scenarios are generated by changing the mobility (locations) of the nodes. The performance metrics that are used to do the analysis are throughput, end-to-end delay and packet delivery ratio. The simulation results showed that the performance of both AODV and DSR degrades in the presence of black hole attack. Throughput and packet delivery ratio decrease when the network is attacked by black hole, because the malicious node absorbs or discards some of the packets. End-to-end delay is also reduced in the presence of a black hole attack because a malicious node pretends to have a valid route to a destination without checking the routing table, and therefore shortens the route discovery process. The results also showed that throughput decreases slightly when mobility of the nodes is increased in the network. The increase in the speed of the nodes decreases both packet delivery ratioand end-to-end delay. The closer the black hole node was to the source node requesting the transmission, the worse the impact. A focused analysis on AODV indicates that, even with the introduction of relatively few black hole nodes to the network, there still exist a potential to bring significant disruptions to communication.

Key words: MANET security, reactive routing protocols, black hole attack, mobility.

1. INTRODUCTION

Mobile Ad-hoc Networks (MANETs) features such as open medium, dynamic topology, lack of centralised management and lack of infrastructure expose them to a number of security attacks. Black hole attack is one type of attack that is more common in MANET reactive routing protocols such as Ad-hoc On-demand Distance Vector (AODV) and Dynamic Source Routing (DSR). Black hole attack takes advantage of route discovery process in reactive routing protocols. In this type of attack, a malicious node misleads other nodes in the network by pretending to have the shortest and updated route to a target node whose packets it wants to interrupt. It then redirects all packets destined to a target node to itself and discards them instead of forwarding. This paper analyses the performance of AODV and DSR when attacked by black hole, by varying the mobility of the nodes in the network. The success of any kind of a network is intensely determined by the confidence people have in its security, it is therefore very crucial for both wired and wireless networks to be secured so as to offer protected communication [1]. Mobile Ad-hoc Network

(MANET) is a group of mobile devices that can spontaneously interconnect and share resources via wireless communication channels, with no fixed network infrastructure or central management. MANETs can be assembled quickly with little cost because they do not require central monitoring or fixed network infrastructure. Mobile nodes in MANET do not necessarily have to be of the same type. They can be PDAs, laptops, mobile phones, routers and printers, as illustrated by Figure 1. The nodes are equipped with antennas which operate as wireless transmitters and receivers. The antennas may be omnidirectional, highly directional, or a combination. The mobile nodes are resource constrained in terms of bandwidth and battery power [2, 3].

MANETs are suitable for providing communications in many applications, particularly in cases where it is not possible to setup a network infrastructure. For instance, in a military operation, where there may be geographical barriers between participants, MANET can be setup to facilitate communication. Also because it is easy to set up, it may be of assistance to replace the damaged

Based on: “Effect of Varying Node Mobility in the Analysis of Black Hole Attack on MANET Reactive Routing Protocols”, by L. Mejaele and E.O. Ochola, which appeared in the Proceedings of Information Security South African (ISSA) 2016, Johannesburg, 17 & 18 August

2016. © 2016 IEEE

Based on: “Effect of Varying Mobility in the Analysis of Black Hole Attack on MANET Reactive Routing protocols”, by E.O. Ochola, L.F. Mejaele, M.M. Eloff and J.A. van der Poll which appeared in the Proceedings of Information Security South African (ISSA) 2016, Johannesburg, 17 & 18 August 2016. © 2016 IEEE

Vol.108 (2) June 2017 SOUTH AFRICAN INSTITUTE OF ELECTRICAL ENGINEERS 81

network infrastructure in disaster recovery operations where temporary network infrastructure is immediately needed [4, 5].

Figure 1: Mobile Ad-hoc Network

The features of MANETs expose them to many security attacks compared to other traditional networks. The high mobility and dynamic topology of MANETs makes routing to be very challenging, that is why early research on MANET mostly concentrated on developing routing mechanisms that are efficient for a dynamic and resource constrained MANET. The security of protocols was given less attention when MANET routing protocols were defined. Black hole attack aims to disrupt the routing process of MANETs [1].

This paper aims to analyse the performance of MANET reactive routing protocols when attacked by a black hole. The two reactive routing protocols that are compared in the analysis are Ad-hoc On-demand Distance Vector (AODV) and Dynamic Source Routing (DSR). The mobility of the nodes in the network is varied during the analysis to determine the impact that mobility has on MANET’s performance and to discover the protocol that is more preferable in a high mobility network. The effect of black hole attack is tested on reactive routing protocols because black hole attack targets route discovery process and can easily attack reactive protocols since they discover the routes frequently.

The rest of the paper is structured as follows: Section 2discusses the vulnerabilities of MANETs that expose them to attacks. Section 3 describes routing in MANETs and discusses the different categories of routing protocols, focusing more on reactive routing protocols. Section 4 explains black hole attack, and describes some of the solutions that have been suggested to lessen the impact of the attack. Section 5 gives the simulation structure used to perform the analysis, presents the results obtained from the simulations and gives the analysis of the results. Section 6 concludes the paper.

2. VULNERABILITIES OF MANETS

It is quite challenging to maintain security in MANETs because they have far more vulnerabilities than wired networks [6]. Any weakness in security system is

vulnerability. Some MANETs’ vulnerabilities are presented as follows:

2.1 Lack of secure boundaries

The nodes in MANET are at liberty to move inside the network, join and leave the network any time. This makes it challenging to establish a security wall as compared to traditional wired networks that have a clear line of defence. In order to attack wired networks, the adversaries must physically enter into the network medium; pass through firewalls and gateways before they have access to practice malicious behaviour to the target nodes in the network. However, in MANET the adversary can communicate with nodes within its transmission range, and become part of the network without any physical access to the network. The absence of secure boundaries causes MANET to be attacked at any time by any malicious node that is within the transmission range of any node in the network [7].

2.2 Lack of centralised management facility

There is no central equipment such as a server for monitoring the nodes in the network and this increases the vulnerability problems of MANETs. Firstly, it becomes very difficult to detect the attacks in the absence of central control because the traffic in an ad-hoc network is very dynamic [8]. Secondly, lack of centralized management delays the nodes’ trust management. It becomes difficult to prior classify the nodes as trustworthy or untrustworthy because the security of the nodes cannot be presumed. Consequently, the nodes cannot be distinguished as trusted or non-trusted. Thirdly, lack of centralized authority can sometimes lead to decentralised decision-making. In MANETs, important algorithms depend on all nodes participating cooperatively. Hence, the attacker can take advantage of this vulnerability and execute attacks that can ensure that the nodes are not cooperative [9].

2.3 Threats from compromised nodes in the network

Each mobile node operates independently, which means it is free to join or leave the network at any time. It therefore becomes difficult for the nodes to set rules and strategies that can prevent malicious behaviour of other nodes in the mobile network. Also, due to freedom of movement of the nodes, a compromised node can target different nodes in the network. Hence, it becomes quite challenging to identify malicious actions of a compromised node in the network, particularly in a huge network. As a result, internal attacks from nodes that have been compromised are more severe than external attacks because they are not easily identified due to the fact that a compromised node operated normally before it could be compromised [7].

Vol.108 (2) June 2017SOUTH AFRICAN INSTITUTE OF ELECTRICAL ENGINEERS82

2.4 Restricted power supply

Mobile devices in MANET get energy from batteries or other exhaustible means, so their energy is limited. This energy restriction can cause denial of service by the attacker; since the attacker is aware of the battery restriction, it can endlessly forward packets to the target node or make the target node to be involved in some time consuming activities. This will cause battery power to be exhausted and the target node will not operate anymore. Again, the limited power supply may cause a node in MANET to behave selfishly by not participating cooperatively in the network activities as a way to save its limited battery. This becomes a problem particularly when it is essential for the node to cooperate with other nodes [10].

3. ROUTING IN MANETS

The topology of MANETs keeps changing rapidly due to free movement of nodes joining and leaving the network any time. Routing is important in order to discover the recent topology so that an updated route to a certain node can be established and a message relayed to the correct destination [3, 11]. The traditional routing protocols such as distance vector and link state protocols that have been structured for hard wired networks cannot be directly applied to MANETs. This is because of mobility and dynamic topology, which are the fundamental characteristics of MANETs [12]. In order to overcomerouting challenges in MANETs and attain effective routing, a number of routing protocols are defined specifically for MANETs. These protocols can be categorized into proactive, reactive and hybrid protocols based on the way paths are established and maintained by the nodes [6]. The hierarchy of the protocols is shown in Figure 2, illustrating the two reactive routing protocols discussed and analysed in this paper.

Figure 2: MANET protocols hierarchy

3.1 Proactive protocols

These are table-driven routing protocols that try to keep a record of fresh and updated network routes. All the nodes in the network have a table to store the routing

information [8]. The nodes exchange topology information so that they can all have the same view of the network. The exchanged information helps to reflect any changes in the topology. Whenever a node needs to send messages, it just searches the routing table for the path to the destination. The sending of the message is not delayed by the remote route discovery [11]. Maintaining an up-to-date topology in the routing tables causes a high control overhead.

3.2 Reactive protocols

Reactive protocols are on demand routing protocols. As the name suggests, the routes to destination nodes are established only when the nodes must send data to destination whose route is unknown. This implies that the source node initiates the searching of routing paths only when needed. When a node wants to send data to a destination node, it starts a route discovery process within the network. Comparative to proactive protocols, the control overhead in reactive protocols is reduced; however the route searching process that occurs before data packets can be forwarded may cause source node to suffer long delays [16]. Reactive protocols use routediscovery and route maintenance processes as explained below:

Route Discovery: Route discovery process is a cycle that involves a broadcast route request and a unicast reply that consists of paths that have been discovered [17]. All the nodes in the network keep a record in a routing table. This record consists of information about neighbouringnodes that can forward the packets so that they reach the destination. When a source node wants to send data packets to a destination node, and there is no routinginformation regarding the destination node in the routing table, the source node initiates a route discovery process [18]. In discovering the route, a source node broadcasts route request (RREQ) packet [19].

When the RREQ packet reaches any node in the network, the node compares the destination IP address to its IP address to determine whether it is the destination node. The node sends back a route reply (RREP) packet if it is the destination, but if it is not, it searches for a route to the destination in its routing table. If there is no route, it broadcasts the RREQ packet to nearby nodes. If there is a route to destination in its routing table, a node compares a RREQ packet sequence number with the destination sequence number in the table to find if the route is updated. The route in the routing table is considered fresh and updated if the destination sequence number in the table is higher than the sequence number attached to the RREQ packet. The intermediate node with an updated route uses the opposite route to send a unicast RREP packet to the source node, and once the source node has received a RREP packet, it begins to send messages through this route. If the route in the table is not fresh enough, the node further sends the RREQ packet to its

Vol.108 (2) June 2017 SOUTH AFRICAN INSTITUTE OF ELECTRICAL ENGINEERS 83

neighbours [18, 20]. Figure 3 summarises the route discovery process in reactive protocols.

Figure 3: Route discovery in reactive protocols

Route Maintenance: During operation, if the source node changes position, it has to establish a new route to the destination by reinitiating route discovery process. But if an intermediate node or a destination node changes position, then any node that notices a damaged link sends a route error (RERR) packet. A RERR packet is relayed to every node that utilizes the affected link for their communication to other nodes. When a RERR packet is received by the source node, it can stop sending the data, or send a new RREQ packet [20].

Figure 4: Route maintenance in reactive protocols

In Figure 4, if Node A leaves the network, Node F which is in the communication range of Node A and Node B will not get a HELLO message from Node A, and that is how node F discovers that Node A has moved. The route through Node A is then marked as invalid by Node F and a RERR message is transmitted to Node B to notify it that Node A is not a neighbour anymore.

3.3 Hybrid protocols

Hybrid protocols are a mixture of proactive and reactive protocols. Their design merges the benefits of both proactive and reactive protocols to yield better results [14]. The hierarchical network model is used to structure majority of hybrid routing protocols. Firstly, all the routing information that is unknown is acquired by using proactive routing. Then reactive routing mechanisms are

used to maintain the routing information when the topology changes [15].

4. BLACK HOLE ATTACK

The proper functioning of MANETs depends on the mutual agreement and understanding between the nodes in the network; however some nodes may become malicious and misbehave. Black hole attack is one of the harmful attacks caused by a malicious node that misbehaves in a network [21]. A malicious node exploits the process of discovering routes in reactive routing protocols. When a source node broadcasts a route request, a malicious node misleads other nodes by claiming to have the best path to the destination. The best path is determined by the shortness and freshness of the route. It achieves this by unicasting false route replies, directing data packets to be routed through it and just discardingthem instead of forwarding [22]. A malicious node can work independently to launch the attack, and this is referred to as single black hole attack, or malicious nodes can work as a group and the attack is referred to as cooperative black hole attack [15].

4.1 Black hole attack categories

The black hole attack can also be classified into two categories based on the cause of the attack: Black hole attack caused by RREP and that caused by RREQ [13] as illustrated in Figure 5 and Figure 6 respectively.

Figure 5: Black hole attack via RREP

Figure 6: Black hole attack through RREQ

Vol.108 (2) June 2017SOUTH AFRICAN INSTITUTE OF ELECTRICAL ENGINEERS84

In Figure 5, the black hole sends a forged RREP message pretending to have a fresh and short path to the destination. This means the black hole always returns a positive RREP even when it has no valid route to the destination. The data packets that are transmitted to the destination will therefore pass through a malicious node which will silently absorb or discard them. In Figure 6, the black hole sends a forged RREQ message to attack a target node in the network. This black hole pretends to be rebroadcasting the RREQ packet that originated from a target node in the network. It then adds itself as the next hop in the route record, so the entire messages destined to the target node will pass through it and it will discard the messages.

4.2 Black hole attack mitigations

There has been various research carried out to discover and mitigate the black hole attack in MANETs [29]. The techniques were tested on AODV-based MANET.However, none of the existing black hole attack mitigations provide a solution that prioritises the detection and elimination of a malicious node based on its closure proximity to the source node, yet such closenessduring route discovery process is considered to be more dangerous to the network performance. Some of the mitigation techniques, with similar omissions are discussed below:

Detection, Prevention and Reactive AODV(DPRAODV):In [23], DPRAODV is proposed. In this scheme, AODV protocol is modified to have a new control packet called ALARM and a threshold value. A threshold value is the average of the difference of destination sequence number in the routing table and sequence number in the RREP packet. In the usual operation of AODV, the node that gets a RREP packet checks the value of sequence number in its routing table. The sequence number of RREP packet has to be higher than the sequence number value in the routing table in order for RREP to be accepted. In DPRAODV, there is an extra threshold value that is matched to RREP sequence number, and if RREP sequence number is greater than the threshold value, then the sender is considered malicious and added to the black list. The neighbouring nodes are notified using an ALARM packet so that the RREP packet from the malicious node is not processed and gets blocked. Automatically, the threshold value gets updated using the data collected in the time interval. This updating of the threshold value helps to detect and stop black hole attacks. The ALARM packet contains the black list that has a malicious node. This list assists the neighbouringnodes not accept any RREP packet sent by a malicious node. Any node that gets a RREP packet looks into the black list and if the reply comes from a node that has been blacklisted, it is ignored and further replies from that node will be discarded. Thus the ALARM packet isolates a malicious node from the network.

Intrusion Detection System AODV (IDSAODV):IDSAODV is proposed in [24] in order to decrease the impact of black hole. This is achieved by altering the way normal AODV updates the routing process. The routing update process is modified by adding a procedure to disregard the route that is established first. The tactic applied in this method is that the network that is attacked has many RREP packets from various paths, so is assumed that the first RREP packet is generated by a malicious node. The assumption is based on the fact that a black hole node just sends a fake RREP packet, without searching through the routing table. Therefore, to avoid updating routing table with wrong route entry, the first RREP is ignored. This method improves packet delivery but it has limitations that; the first RREP can be received from an intermediate node that has an updated route to the destination node, or if RREP message from a malicious node can arrive second at the source node, the method is not able to detect the attack.

Enhanced AODV (EAODV): In [25], the authors proposed EAODV. Similar to IDSAODV, EAODV allows numerous RREPs from various paths to lighten the effect of black hole attack. This method makes an assumption that eventually the actual destination node will unicast a RREP packet, so the source node overlooks all previous RREP entries, including the ones from malicious node and takes the latest RREP packet. The source node keeps on updating its routing table with RREPs being received until a RREP from the actual destination is received. Then all RREPs get analysed and suspicious nodes are discovered and isolated from the network. The limitation to this method is that it adds two processes that increase delay and exhaust energy of the nodes.

Secure AODV (SAODV): The authors in [26] proposed a secure routing protocol, SAODV that addresses black hole attack in AODV. The difference between AODV and SAODV is that in SAODV, there are random numbers that are used to verify the destination node. An extra verification packet is introduced in the route discovery process. After getting a RREP packet, the source node stores it in the routing table, then sends an instant verification packet using reverse route of received RREP. The verification packet consists of a random number created by the source node. When two or more verification packets from the source node are received at the destination node, coming from different routes, the destination node stores them in its routing table and checks whether the contents contain the same random numbers. If the verification packets contain same random numbers along different paths, the verification confirm packet is sent by the destination node to the source node. The confirm packet consists of random number generated by destination node. If the verification confirm packet contains different random numbers, the source node will wait until at least two or more verification confirm packets contain same random numbers. When two or more verification confirm packets with the same random

Vol.108 (2) June 2017 SOUTH AFRICAN INSTITUTE OF ELECTRICAL ENGINEERS 85

numbers are received by the source node, it will use the shortest route to send data to the destination node. The security mechanism in this protocol is that malicious node pretending to be the destination node will not send the correct verification confirm packet to the source node.

Trust-based approach: The authors in [27] suggested a trust based approach to mitigate the black hole attack. In this approach, every node keeps a trust value on all its neighbours. The trust value is computed as the proportion of discarded packets to forwarded packets. Each node ensures that the neighbouring node forwards the packets sent to it, unless the packet is destined to the neighbouring node. As a way to ensure that the packets are forwarded, each node implements a caching mechanism by storing the packet being forwarded to the neighbouring node in its cache, and then promiscuously monitoring the neighbouring node to check whether it forwards the packet. If the neighbouring node forwards the packet, it compares it with the packet stored in its cache, and the node assumes the packet has been forwarded if they match. Else, after a set time the node assumes the packet has been discarded by its neighbourand the neighbouring node is suspected to be malicious. All the nodes in the network will get to know the behaviour of the neighbouring nodes, and can therefore periodically assign trust values that represent the trustworthiness of the neighbouring nodes. All RREP packets from a node that has been recognized as malicious are ignored, and the routes will only be selected through trusted nodes. A trust based solution approach is further suggested in [28] where each node calculates a trust value of neighbouring nodes. If trust goes below a certain threshold, then the node discards the neighbour from future routes. This solution was simulated on NS2 and showed much better results in scenarios where the AODV protocol is under attack.

Solution using packet sequence number: In the regular operation of AODV, the source node compares the value of RREP sequence number with sequence number in its routing table. The RREP packet is accepted only if its sequence number has a value higher than the sequence number in source’s routing table. A solution that requires the use of two additional small tables in every node is proposed in [5]. The sequence number for the last packet sent by a node is to be recorded in one table and another table should record the sequence number for last packet received from every node. Every time a packet is received or sent by a node, the tables are updated. During route discovery process, the source node broadcasts a RREQ packet to nearby nodes. The destination node or the intermediate node that has a fresh route to the destination will reply to the sender with RREP packet that contains the last packet sequence number received from the source node. The source node will therefore verify that the sequence number of RREP received matches the record it has in the table, and if it does not, the RREP packet is suspected to be from a malicious node. Since the sequence number is already part of communication in

the base protocol, this solution does not increase overhead to the transmission channel. It makes it easy to recognize a suspicious reply.

The omission of black hole attack analysis based on the malicious node’s location from the source node in the discussed existing solution approaches makes it necessary to conduct such experiments as presented in section 5.2.

5. SIMULATIONS AND RESULTS

This section presents and discusses simulation results that were conducted under varied parameters, to analyse the performance of selected reactive routing protocols (AODV and DSR). The simulations were carried out at different node mobility speed and the network performance were analysed under normal condition (i.e., no black hole node), and when under attack by a black hole node. AODV routing protocol was given further attention under different setup, as discussed in section 5.2, after it performed dismally in comparison to DSR, as presented in section 5.1.

5.1 AODV vs DSR in a highly dynamic network

The results are obtained from simulations implemented on Network Simulator 2 (NS-2) and are presented using graphs. NS-2 is distributed freely and is an open source environment which allows the creation of new protocols, and modification of existing ones, so it is possible to introduce a black hole attack in NS-2 by modifying its source code [28]. A typical simulation with NS-2 consists of creating a scenario file that defines the position and movement patterns of the nodes, and a communication file that defines connection and traffic in the network. Each run of simulation produces a detailed trace file that shows events (such as number of packets delivered successfully) happening during simulation. Figure 7illustrates NS-2 simulation process.

Figure 7: NS-2 Simulation Process

The simulation parameters used in this sub-section are shown in Table 1.

Vol.108 (2) June 2017SOUTH AFRICAN INSTITUTE OF ELECTRICAL ENGINEERS86

Table 1: AODV and DSR simulation parameters

Parameter: Values:Simulator NS-2.35

Mobility Model Random Waypoint [13]Simulation Time 500 seconds

Terrain Area 670m x 670mNumber of nodes 20

Number of black hole nodes 1Traffic Type CBR (UDP)Packet Size 512 bytes

Routing Protocols AODV, DSRTransmission Rate 4 packets/secMaximum Speed 20 – 80 m/s

Pause Time 0 secondsTransmission Range 250m

Sub-section results analysis: The performance metrics used are throughput, packet delivery ratio and end-to-end delay. In order to analyse the effect of mobility, the speed at which the nodes move was varied from 20m/s to 80m/s to create different scenarios. The total number of nodes and maximum number of connections were kept constant at 20 and 10 respectively. The results show the effect of mobility for both AODV and DSR protocols when the network is under a black hole attack and when there is no black hole attack.

Throughput: The simulation results of Figure 8 show that increasing the speed of the nodes in the network does not bring significant change in throughput. For both protocols, throughput decreases slightly. This is caused by the rapid change of positions of the nodes, which may cause the path to the destination to change while some packets have been transmitted from the source node using the old route. Therefore the transmitted packets get lost on the way. Throughput of the network under black hole attack decreases because the malicious node discards some of the packets. AODV’s throughput drops drastically compared to DSR’s throughput.

20 30 40 50 60 70 800

20

40

60

80

100

120

140

Speed (m/s)

Thr

ough

put (

kbps

)

AODVBlack hole AODVDSRBlack hole DSR

Figure 8: Throughput AODV vs. DSR

Packet delivery ratio: When the mobility of the nodes is increased packet delivery ratio decreases a little, as illustrated in Figure 9. This is because some of the packets may get lost along the way to the destination when the path from the source node to the destination node changes due to rapid change of intermediate nodes’ positions. The packet delivery ratio of AODV is very low compared to that of DSR when the black hole attack has been launched against the network.

20 30 40 50 60 70 800

10

20

30

40

50

60

70

80

90

100

Speed (m/s)

Pac

ket D

eliv

ery

Rat

io (%

)

AODVBlack hole AODVDSRBlack hole DSR

Figure 9: Packet delivery ratio AODV vs. DSR

End-to-end delay: Figure 10 shows that end-to-end delay decreases with increase in speed because the nodes’ movement gets more frequent and the routing updates are regularly exchanged. When there is a black hole attack, end-to-end delay gets even lower because the malicious node pretends to have a valid route to the destination without checking in the routing table, so the route discovery process takes a shorter time.

20 30 40 50 60 70 800

20

40

60

80

100

120

140

160

180

Speed (m/s)

End

-to-e

nd d

elay

(ms)

AODVBlack hole AODVDSRBlack hole DSR

Figure 10: End-to-end delay AODV vs. DSR

Vol.108 (2) June 2017 SOUTH AFRICAN INSTITUTE OF ELECTRICAL ENGINEERS 87

5.2 A focus on AODV at low mobility and dense network

This sub-section presents the simulation of a black hole node implementation which was done through the modification of the existing AODV implementation, according to the NS 3.16 version. A modification to the routing protocol to include a black hole node was achieved through the introduction of a black hole flag to the node that exhibits black hole attack features, i.e., that which replies positively to every received route request, thereby acting as the communication end point. Figure 11 shows the simulation network grid, and a node's transmission range within the grid. Figure 12 is an illustration of a route chosen by the AODV routingprotocol for a successful PING, which originates from node 0 to node 99. The route followed by the packet keeps on changing as the network topology changes, and with the introduction of a black hole node in the network, the malicious node may attract traffic to form part of the intermediaries. The effectiveness of a black hole node is determined by its grid position at the time of packet transmission in relation to the source and destination nodes positions. Nodes were set to have minimal movements according to the implemented mobility model. Applicable modifications to the simulation setup were done as shown in Table 2.

Table 2: Modified AODV simulation parameters

Parameter: Values:Simulator NS-3.16

Mobility Model Random Walk [13]Number of nodes 100

Number of black hole nodes 1Routing Protocols AODV

Figure 11: Simulation grid and node range

Figure 12: Ping message transmissions

Sub-section evaluation of results: Simulation runs were conducted in a network with and without a black holenode. The scenarios in which black hole nodes were present indicated that the attack had a devastating effect on the network performance. This was evident by the fact that all the traffics that were destined for node 99 via a black hole were dropped at the malicious node. However, there were successful packet delivery in scenarios where the black hole node was not an intermediary during the packet relay, e.g., when the malicious node was at the grid peripheries. Though, this didn’t occur in most of the simulation runs due to the long transmission ranges of the nodes, which mostly enabled black hole nodes to be involved in the route. But, with shorter transmission ranges or densely populated network, there could bescenarios of missing black hole along the route, which leads to successfully transmission. However, even with no packet drops from attackers, few cases of unsuccessful packet delivery could be recorded due to the nature of MANETs such as wireless channel errors and path breaks as a result of dynamic topology, leading to generations of RERR message notifications.

Black hole nodes which were located closer to the source node at the time of transmissions caused great negative impacts to the network performance, as shown in Figure 13, an example of which was when node 25 in Figure 12was selected as the black hole. Figure 13 shows a complete communication breakdown when the black hole node was introduced. The performance of an attack free network is also shown in the figure, on how MANET performed normally under similar settings. However, with increased number of PING requests and having the black hole positioned further away from the source node, the network had normal packet transmissions for a while, until the black hole node was finally encountered, as shown in Figure 14. An example of which was when node 73 in Figure 12 was set as the black hole. This is an indication of possibly not being able to notice the

Vol.108 (2) June 2017SOUTH AFRICAN INSTITUTE OF ELECTRICAL ENGINEERS88

existence of black hole nodes in a network, unless they are encountered along the route.

0 1 2 3 4 5 6 7 8 9-5

0

5

10

15

20

25

30

35

Ping packet sequence number

Pin

g re

spon

ce (m

s)

Normal AODVBlack hole AODV

Figure 13: Effect of Black hole node closer to source node at transmit time

0 10 20 30 40 50 60-5

0

5

10

15

20

25

30

35

40

Simulation time (sec)

Pac

ket r

espo

nce

time

(ms)

Figure 14: Effect of black hole node that is far away from the source node during message transmissions

[1-22] [23-43] [44-64] [65-85] [86-106]0

5

10

15

20

25

30

Num

ber o

f bla

ck h

ole

enco

unte

rs

Grid positions

Figure 15: Black hole nodes’ distribution on the grid

A histogram displaying the distribution of black hole nodes within the network grid is shown in Figure 15. The data were cumulated at the end of the simulation runs. These values depict a well distributed selection of black hole nodes over the network grid, as usually occur in the practical scenario. This is necessary in results validation.

67%

21%

11%

Complete loss None loss Partial loss

Figure 16: Data loss due to black hole attacks

Figure 16 presents the destructive nature of the attack, in which 67% of transmissions completely failed to be delivered, 11% of the transmitted packets experienced partial loss (i.e., delivered with errors), and only 21% of the transmission were successful. The successful packet deliveries occurred whenever a route didn't include a black hole node as an intermediary. However, all packets were dropped (i.e., complete loss) in all the cases where a black hole was part of the route during packet transmission. The data presented in Figure 16 was collected from the 100 simulation runs performed.

1 2 3 4 5 60

20

40

60

80

100

Simulation run (sec)

Bla

ck h

ole

node

pos

ition

Black hole node

Figure 17: Black hole nodes with minimal impact

Vol.108 (2) June 2017 SOUTH AFRICAN INSTITUTE OF ELECTRICAL ENGINEERS 89

0 20 40 60 80 1000

20

40

60

80

100

Simulation time (sec)

Grid

pos

ition

Black hole

Figure 18: Black hole nodes with greatest negative impact

The black hole node position at the start of the simulation run had a contribution on to what extent the attack impact could be to the network performance. Figure 17 shows the initial black hole node positions which posted minimal interference to packet transmissions. This implied that the nodes mostly failed to be part of the routes during the simulation runs, when they originally occupied such locations on the grid, since communications were successful in those scenarios. The figure shows that the black hole nodes were mostly never encountered when they were initialised at the grid positions within node 20 to 40. Such consistency correlated with the minimal node mobility setup. Thegreatest attack impacts were recorded when the black hole nodes were initialised at positions shown in Figure 18. With malicious nodes occupying those positions at the beginning of each simulation run, most of the packets were never delivered, as they were consumed by the black hole nodes, an indication that an attacker was successful in attracting the traffic and thereby dropping the packets, a successful scenario of black hole attack execution.

6. CONCLUSION AND FUTURE WORK

The section presents the paper concluding remarks and gives the future research directions along the study focus area. Conclusions on the network performance comparisons between AODV and DSR routing protocols are presented in section 6.1. Whereas, concluding remarks focusing further on AODV routing, are similarly presented under section 6.2.

6.1 Joint analysis of AODV and DSR in a highly dynamic network

This paper has analysed the black hole attack on MANET reactive routing protocols (AODV and DSR). The analysis is done by varying the mobility of the nodes to determine the effect that mobility has on the way the protocols perform. The results obtained from simulations indicate that the performance of DSR degrades more than

the performance of AODV when the speed of the nodes is increased, so it can be concluded that AODV is more preferred in a high mobility network. Furthermore, the results show that the black hole attack degrades the performance of both AODV-based MANET and DSR-based MANET, but the impact is more severe on AODV than DSR. It can therefore be concluded that DSR is more preferred in a network that is frequently attacked by the black hole.

6.2 Isolated AODV analysis at low mobility

Standard AODV: Ideal conditions (e.g., long transmission ranges, low node mobility and densely populated network) were setup to favour AODV routing protocol, which resulted to good performances, despite the dynamic topologies. It was found that long enough nodes’transmission ranges in a relatively less dynamic network, yielded AODV best performances. Such favoured AODV performance may be too good for the real world practical scenario, where devices settings are not necessarily uniform. The favourable settings were intentionally put in place to give the protocol an upper hand in the presence of an attacker in the network, so as to register some successful transmissions, for performance analysis purposes.

Standard AODV with black hole nodes: A different network performance was noted whenever a simulation run was conducted in the presence of a black hole node. The performance degradation impact depended on the attacker's position at the beginning of each simulation run. Total packet losses were registered whenever a black hole node was located closer to a source node during transmission, leading to the worst cases of network performances. However, successful communications were recorded whenever the black hole node was located far away from the source node, mostly at the grid peripheries. This meant that the malicious node was not encountered during the packet transmissions, as chances of having it as an intermediary node was reduced. The simulation setup was favourable to AODV routing protocol, with only 1% of the network nodes being set as a black hole in every simulation run. The real world practical network performance may be worse than the simulation tests results, since MANET are mostly deployed in hostile environments, which may have many malicious nodes at a given time, thereby completely halting the network operations, through cooperative black hole attack. In addition, the real world implementation scenario may show lower network performances, when more packets are dropped naturally due to channel errors, e.g., transmissions collisions due to the wireless media.

The results analysis confirms the need to investigate black hole attack solutions that have the ability to vary priorities (detection metrics parameters) based on the suspect’s location from a source node. Implying that, a suspect closer to a transmitting node should receive higher penalties to be blacklisted earlier to avoid potential

Vol.108 (2) June 2017SOUTH AFRICAN INSTITUTE OF ELECTRICAL ENGINEERS90

devastating attacks. A future work will focus on proposing a black hole attack solution which takes into consideration the position of a potential black hole node during route discovery. Future works will also includecomparisons of results obtained from simulation runsagainst those from real world experiments, under similar setups, to analyse simulation error margins. In addition, future work will consider experiments with different models of mobility and different traffic patterns.

7. REFERENCES

[1] C. Yu, T. K. Wu, R. Cheng and S. Chang: "A distributed and cooperative black hole node detection and elimination mechanism for ad hoc networks",Emerging Technologies in Knowledge Discovery and Data Mining, pp. 538-549, 2007.

[2] K. Osathanunkul and N. Zhang: "A countermeasure to black hole attacks in mobile ad hoc networks",Proceedings of the IEEE International Conference on Networking, Sensing and Control (ICNSC), pp. 508-513, 2011.

[3] B. Wu, J. Chen, J. Wu and M. Cardei: "A survey of attacks and countermeasures in mobile ad hoc networks", Wireless Network Security Springer, pp. 103-135, 2007.

[4] C. Rajabhushanam and A. Kathirvel: "Survey of wireless MANET application in battlefield operations", International Journal of Advanced Computer Science and Applications (IJACSA), Vol. 2, pp. 50-58, 2011.

[5] R. Mishra, S. Sharma and R. Agrawal:"Vulnerabilities and security for ad-hoc networks",Proceedings of the International Conference onNetworking and Information Technology (ICNIT),pp. 192-196, 2010.

[6] N. Sharma and A. Sharma: "The black-hole node attack in MANET", Proceedings of the 2nd

International Conference on Advanced Computing & Communication Technologies (ACCT), pp. 546-550,2012.

[7] Y. Rajesh and S. Anil: "Secure AODV protocol to mitigate black hole attack in Mobile Ad hoc Networks", Proceedings of the 3rd International Conference on Computing, Communication and Networking Technologies (ICCCNT), pp. 1-4, 2012.

[8] I. Zaiba: "Security issues, challenges and solution in MANET", International Journal of Computer Science and Technology, Vol. 2 No. 4, pp. 108-112, 2011.

[9] P. Goyal, V. Parmar and R. Rishi: "MANET:vulnerabilities, challenges, attacks, application",

IJCEM International Journal of Computational Engineering & Management, vol. 11, pp. 32-37, 2011.

[10] U. K. Singh, K. Phuleria, S. Sharma and D. Goswami: "An analysis of Security Attacks found in Mobile Ad-hoc Network", International Journal of Advanced Research in Computer Science, Vol. 5 No. 5, pp. 34-39, 2014.

[11] W. Li and A. Joshi: "Security issues in mobile ad hoc networks-a survey", Department of Computer Science and Electrical Engineering, University of Maryland, Baltimore County, pp. 1-23, 2008.

[12] B. Kannhavong, H. Nakayama, Y. Nemoto, N. Kato and A. Jamalipour: "A survey of routing attacks in mobile ad hoc networks", IEEE Wireless Communications, Vol. 14 No. 5, pp. 85-91, 2007.

[13] M. K. J. Kumar and R. S. Rajesh: "Performance Analysis of MANET Routing Protocols in Different Mobility Models", IJCSNS International Journal of Computer Science and Network Security, Vol. 9 No. 2, pp. 22-29, 2009.

[14] V. C. Giruka and M. Singhal: "Secure Routing in Wireless Ad-Hoc Networks", Signals and Communication Technology, pp. 137-158, 2007.

[15] P. K. Singh and G. Sharma: "An efficient prevention of black hole problem in AODV routing protocol in MANET", Proceedings of the 11th IEEE International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom), pp. 902-906, 2012.

[16] F. Tseng, L. Chou and H. Chao: "A survey of black hole attacks in wireless mobile ad hoc networks", Human-Centric Computing and Information Sciences, Vol. 1 No. 4, pp. 1-16, 2011.

[17] A. N. Thakare and M. Joshi: "Performance Analysisof AODV & DSR Routing Protocol in Mobile Ad hoc Networks", IJCA Special Issue on MANETs,Vol. 1 No. 4, pp. 211-218, 2010.

[18] R. Agrawal, R. Tripathi and S. Tiwari: "Performance evaluation and comparison of AODV and DSR under adversarial environment", Proceedings of theInternational Conference on Computational Intelligence and Communication Networks (CICN),pp. 596-600, 2011.

[19] R. H. Jhaveri, A. D. Patel, J. D. Parmar and B. I. Shah: "MANET routing protocols and wormhole attack against AODV", International Journal of Computer Science and Network Security, Vol. 10 No. 4, pp. 12-18, 2010.

Vol.108 (2) June 2017 SOUTH AFRICAN INSTITUTE OF ELECTRICAL ENGINEERS 91

[20] N. Purohit, R. Sinha and K. Maurya: "Simulation study of black hole and jellyfish attack on MANET using NS3", Proceedings of the Nirma University International Conference on Engineering (NUiCONE), pp. 1-5, 2011.

[21] M. Medadian, A. Mebadi and E. Shahri: "Combat with black hole attack in AODV routing protocol",Proceedings of the 9th IEEE Malaysia International Conference on Communications (MICC), pp. 530-535, 2009.

[22] A. Vani and D. S. Rao: "Removal of black hole attack in ad hoc wireless networks to provide confidentiality security service", International Journal of Engineering Science and Technology,Vol. 3, pp.2377-2384, 2011.

[23] P. N. Raj and P. B. Swadas: "DPRAODV: A dynamic learning system against blackhole attack in AODV based MANET", IJCSI International Journal of Computer Science Issues, Vol.2, pp.54-59, 2009.

[24] R. Suryawanshi and S. Tamhankar: "Performance Analysis and Minimization of Blackhole Attack in MANET", International Journal of Engineering Research and Applications (IJERA), Vol.2 No. 4, pp.1430-1437, 2012.

[25] Z. Ahmad, K. A. Jalil and J. Manan: "Black hole effect mitigation method in AODV routing protocol",Proceedings of the 7th International Conference on Information Assurance and Security (IAS), pp. 151-155, 2011.

[26] S. Lu, L. Li, K. Lam and L. Jia: "SAODV: A MANET routing protocol that can withstand black hole attack", Proceedings of the International Conference on Computational Intelligence and Security (CIS'09), pp. 421-425, 2009.

[27] J. Pan and R.Jain: "A survey of network simulation tools: Current status and future development",Internet:http:// www1.cse.wustl.edu/~jain/cse567-08/ftp/simtools.pdf, Nov. 24, 2008 [May 5, 2016].

[28] F. Thachil and K. C. Shet: “A Trust Based Approach for AODV Protocol to Mitigate Black Hole Attack in MANET”, Proceedings of the International Conference on Computing Sciences, pp. 281-285, 2012.

[29] E. O. Ochola, M. M. Eloff and J. A. van der Poll:“Beyond Watchdog Schemes in Securing MANET’s Reactive Protocols Operating on a Dynamic Transmission Power Control Technique”,Proceedings of the SAI Computing Conference, pp.637-643, 2016.

Vol.108 (2) June 2017SOUTH AFRICAN INSTITUTE OF ELECTRICAL ENGINEERS92

NOTES

Vol.108 (2) June 2017 SOUTH AFRICAN INSTITUTE OF ELECTRICAL ENGINEERS 93

This journal publishes research, survey and expository contributions in the field of electrical, electronics, computer, information and communications engineering. Articles may be of a theoretical or applied nature, must be novel and

must not have been published elsewhere.

Nature of ArticlesTwo types of articles may be submitted:

• Papers: Presentation of significant research and development and/or novel applications in electrical, electronic, computer, information or communications engineering.

• Research and Development Notes: Brief technical contributions, technical comments on published papers or on electrical engineering topics.

All contributions are reviewed with the aid of appropriate reviewers. A slightly simplified review procedure is used in the case of Research and Development Notes, to minimize publication delays. No maximum length for a paper

is prescribed. However, authors should keep in mind that a significant factor in the review of the manuscript will be its length relative to its content and clarity of writing. Membership of the SAIEE is not required.

Process for initial submission of manuscriptPreferred submission is by e-mail in electronic MS Word and PDF formats. PDF format files should be ‘press

optimised’ and include all embedded fonts, diagrams etc. All diagrams to be in black and white (not colour). For printed submissions contact the Managing Editor. Submissions should be made to:

The Managing Editor, SAIEE Africa Research Journal, PO Box 751253, Gardenview 2047, South Africa.

E-mail: [email protected]

These submissions will be used in the review process. Receipt will be acknowledged by the Editor-in-Chief and subsequently by the assigned Specialist Editor, who will further handle the paper and all correspondence pertaining

to it. Once accepted for publication, you will be notified of acceptance and of any alterations necessary. You will then be requested to prepare and submit the final script. The initial paper should be structured as follows:

• TITLE in capitals, not underlined.• Author name(s): First name(s) or initials, surname (without academic title or preposition ‘by’)• Abstract, in single spacing, not exceeding 20 lines.• List of references (references to published literature should be cited in the text using Arabic numerals in

square brackets and arranged in numerical order in the List of References).• Author(s) affiliation and postal address(es), and email address(es).• Footnotes, if unavoidable, should be typed in single spacing.• Authors must refer to the website: http: //www.saiee.org.za/arj where detailed guidelines, including

templates, are provided.

Format of the final manuscriptThe final manuscript will be produced in a ‘direct to plate’ process. The assigned Specialist Editor will provide you

with instructions for preparation of the final manuscript and required format, to be submitted directly to: The Managing Editor, SAIEE Africa Research Journal, PO Box 751253, Gardenview 2047, South Africa.

E-mail: [email protected]

Page chargesA page charge of R200 per page will be charged to offset some of the expenses incurred in publishing the work.

Detailed instructions will be sent to you once your manuscript has been accepted for publication.

Additional copiesAn additional copy of the issue in which articles appear, will be provided free of charge to authors.

If the page charge is honoured the authors will also receive 10 free reprints without covers.

CopyrightUnless otherwise stated on the first page of a published paper, copyright in all contributions accepted for publication is vested in the SAIEE, from whom permission should be obtained for the publication of whole or part of such material.

SAIEE AFRICA RESEARCH JOURNAL – NOTES FOR AUTHORS

Vol.108 (2) June 2017SOUTH AFRICAN INSTITUTE OF ELECTRICAL ENGINEERS94

South African Institute for Electrical Engineers (SAIEE)PO Box 751253, Gardenview, 2047, South Africa

Tel: 27 11 487 3003 | Fax: 27 11 487 3002E-mail: [email protected] | Website: www.saiee.org.za