EVALUATING MACHINE LEARNING TECHNIQUES FOR SMART … · The Internet of Things (IoT) is expected to have a network of over 31 billion devices by 2020 . IoT devices have the potential

EVALUATING MACHINE LEARNING TECHNIQUES FOR SMART HOME DEVICE CLASSIFICATION

THESIS

Angelito E. Aragon Jr., Captain, USAF

AFIT-ENG-MS-19-M-006

DEPARTMENT OF THE AIR FORCE AIR UNIVERSITY

AIR FORCE INSTITUTE OF TECHNOLOGY

Wright-Patterson Air Force Base, Ohio

DISTRIBUTION STATEMENT A. APPROVED FOR PUBLIC RELEASE; DISTRIBUTION UNLIMITED.

The views expressed in this document are those of the author and do not reflect the official policy or position of the United States Air Force, the United States Department of Defense or the United States Government. This material is declared a work of the U.S. Government and is not subject to copyright protection in the United States.


EVALUATING MACHINE LEARNING TECHNIQUES FOR SMART HOME DEVICE CLASSIFICATION

THESIS

Presented to the Faculty

Department of Electrical and Computer Engineering

Graduate School of Engineering and Management

Air Force Institute of Technology

Air University

Air Education and Training Command

In Partial Fulfillment of the Requirements for the

Degree of Master of Science in Cyber Operations

Angelito E. Aragon Jr., B.S.C.E.

Capt, USAF

March 2019

DISTRIBUTION STATEMENT A. APPROVED FOR PUBLIC RELEASE; DISTRIBUTION UNLIMITED.


EVALUATING MACHINE LEARNING TECHNIQUES

FOR SMART HOME DEVICE CLASSIFICATION

Angelito E. Aragon Jr., B.S.C.E. Capt, USAF

Committee Membership:

Barry E. Mullins, Ph.D., P.E. Chair

Brett J. Borghetti, Ph.D.

Member

Timothy H. Lacey, Ph.D., CISSP Member

i


Abstract

Smart devices in the Internet of Things (IoT) have transformed the management of

personal and industrial spaces. Leveraging inexpensive computing, smart devices enable

remote sensing and automated control over a diverse range of processes. Even as IoT

devices provide numerous benefits, it is vital that their emerging security implications are

studied. IoT device design typically focuses on cost efficiency and time to market, leading

to limited built-in encryption, questionable supply chains, and poor data security. In a 2017

report, the United States Government Accountability Office recommended that the

Department of Defense investigate the risks IoT devices pose to operations security,

information leakage, and endangerment of senior leaders [1].

Recent research has shown that it is possible to model a subject’s pattern-of-life

through data leakage from Bluetooth Low Energy (BLE) and Wi-Fi smart home devices

[2]. A key step in establishing pattern-of-life is the identification of the device types within

the smart home. Device type is defined as the functional purpose of the IoT device, e.g.,

camera, lock, and plug. This research hypothesizes that machine learning algorithms can

be used to accurately perform classification of smart home devices.

To test this hypothesis, a Smart Home Environment (SHE) is built using a variety

of commercially-available BLE and Wi-Fi devices. SHE produces actual smart device

traffic that is used to create a dataset for machine learning classification. Six device types

are included in SHE: door sensors, locks, and temperature sensors using BLE, and smart

ii

bulbs, cameras, and smart plugs using Wi-Fi. In addition, a device classification pipeline

(DCP) is designed to collect and preprocess the wireless traffic, extract features, and

produce tuned models for testing. K-nearest neighbors (KNN), linear discriminant analysis

(LDA), and random forests (RF) classifiers are built and tuned for experimental testing.

During this experiment, the classifiers are tested on their ability to distinguish

device types in a multiclass classification scheme. Classifier performance is evaluated

using the Matthews correlation coefficient (MCC), mean recall, and mean precision

metrics. Using all available features, the classifier with the best overall performance is the

KNN classifier. The KNN classifier was able to identify BLE device types with an MCC

of 0.55, a mean precision of 54%, and a mean recall of 64%, and Wi-Fi device types with

an MCC of 0.71, a mean precision of 81%, and a mean recall of 81%. Experimental results

provide support towards the hypothesis that machine learning can classify IoT device types

to a high level of performance, but more work is necessary to build a more robust classifier.

iii

Acknowledgements

To my family and friends: thank you for your unfailing support and encouragement

throughout this journey. I would not be where I am today without you.

To my advisor, Dr. Mullins: thank you for your mentorship throughout my time

here in AFIT. Your door was always open whenever I needed guidance.

To Dr. Borghetti, Dr. Lacey, and all AFIT faculty: thank you for sharing your time

and knowledge with me. I am grateful that I had the opportunity to learn so much from

your expertise and experience.

Bryan Aragon

iv

Table of Contents Page

Abstract ................................................................................................................................ i Acknowledgements ............................................................................................................ iii List of Figures ................................................................................................................... vii List of Tables .......................................................................................................................x

I. Introduction ..................................................................................................................1

1.1 Background .......................................................................................................1 1.2 Problem Statement ............................................................................................1 1.3 Hypothesis and Research Goals ........................................................................2 1.4 Approach ...........................................................................................................3 1.5 Assumptions/Limitations ..................................................................................3 1.6 Contributions ....................................................................................................3 1.7 Thesis Overview ...............................................................................................4

II. Background and Related Research ...............................................................................5

2.1 Overview ...........................................................................................................5 2.2 Wi-Fi .................................................................................................................5 2.3 Bluetooth Low Energy ......................................................................................6

2.3.1 BLE Link-Layer States ...................................................................... 10 2.3.2 BLE Packet Structure ......................................................................... 12 2.3.3 Creating Connections ......................................................................... 15 2.3.4 Sending Data ...................................................................................... 17 2.3.5 Attributes ............................................................................................ 19 2.3.6 Security .............................................................................................. 23

2.4 Machine Learning ...........................................................................................24 2.4.1 Classification Algorithms .................................................................. 26

2.5 Related Work ..................................................................................................27 2.6 Terminology ...................................................................................................31 2.7 Background Summary ....................................................................................32

III. System Design ............................................................................................................33

3.1 Overview .........................................................................................................33 3.2 Smart Home Environment (SHE) ...................................................................33

3.2.1 Controller ........................................................................................... 33 3.2.2 Wi-Fi Devices .................................................................................... 34 3.2.3 BLE Devices ...................................................................................... 35 3.2.4 Device Actions ................................................................................... 36 3.2.5 Device Location and Setup ................................................................ 37

3.3 Device Classification Pipeline (DCP) ............................................................42 3.3.1 Data Collection Hardware .................................................................. 43

v

3.3.2 Data Collection .................................................................................. 44 3.3.3 Data Preprocessing ............................................................................. 49 3.3.4 Model Tuning ..................................................................................... 57

3.4 Data Exploration .............................................................................................60 3.4.1 BLE Data Exploration ........................................................................ 61 3.4.2 Wi-Fi Data Exploration ...................................................................... 65

3.5 Design Summary ............................................................................................68

IV. Methodology ..............................................................................................................69

4.1 Problem/Objective ..........................................................................................69 4.2 System under Test ...........................................................................................69

4.2.1 Assumptions ....................................................................................... 70 4.3 Response Variables .........................................................................................70 4.4 Performance Metrics .......................................................................................73

4.4.1 Matthews Correlation Coefficient (MCC) ......................................... 73 4.4.2 Mean Precision ................................................................................... 74 4.4.3 Mean Recall ....................................................................................... 74 4.4.4 High Performance .............................................................................. 75

4.5 Control Variables ............................................................................................76 4.6 Uncontrolled Variables ...................................................................................76 4.7 Parameters .......................................................................................................77 4.8 Factors .............................................................................................................79 4.9 Experimental Design ......................................................................................80 4.10 Methodology Summary ..................................................................................81

V. Results and Analysis ..................................................................................................82

5.1 Overview .........................................................................................................82 5.2 BLE Classifier Performance ...........................................................................82

5.2.1 BLE Full-featured Classification ....................................................... 82 5.2.2 BLE Best-Features Classification ...................................................... 87 5.2.3 BLE Classification Analysis .............................................................. 88

5.3 Wi-Fi Classifier Performance .........................................................................91 5.3.1 Wi-Fi Full-featured Classification ..................................................... 91 5.3.2 Wi-Fi Best-features Classification ..................................................... 94 5.3.3 Wi-Fi Classification Analysis ............................................................ 95

5.4 Results Summary ..........................................................................................101

VI. Conclusion ................................................................................................................103

6.1 Overview .......................................................................................................103 6.2 Research Conclusions ...................................................................................103 6.3 Research Significance ...................................................................................105 6.4 Future Work ..................................................................................................105

Appendix A. Device Details .......................................................................................108

Appendix B. Hyperparameter Tuning Script ..............................................................111

vi

Appendix C. Experimental Procedure ........................................................................116

Appendix D. Multiclass BLE Classification ...............................................................117

Appendix E. Multiclass Wi-Fi Classification .............................................................123

Appendix F. Classification Analysis Graphs .............................................................130

Bibliography ....................................................................................................................132

vii

List of Figures

Figure Page

1. Wi-Fi Frame Fields [5] ................................................................................................. 6

2. BLE Architecture [6] .................................................................................................... 9

3. BLE Channel Map [6] ................................................................................................. 10

4. BLE States [6] ............................................................................................................. 10

5. BLE Link-Layer Packet Structure [6] ......................................................................... 12

6. BLE Data Packet ......................................................................................................... 15

7. BLE Connection and Data Sending Diagram ............................................................. 16

8. Data Transmission [6] ................................................................................................. 19

9. GATT Client-Server Interaction [8] ........................................................................... 20

10. GATT Structure [8]................................................................................................... 23

11. Netgear Nighthawk X4S R7800 router used as AP .................................................. 34

12. Wi-Fi Access Point Settings ..................................................................................... 34

13. SHE Device Locations (not to scale) ........................................................................ 38

14. Door Sensor Setup .................................................................................................... 39

15. Light Bulb Setup ....................................................................................................... 40

16. Lock Setup ................................................................................................................ 40

17. Plug Setup ................................................................................................................. 41

18. Temperature Sensor Setup ........................................................................................ 41

19. DCP System Diagram ............................................................................................... 42

viii

20. Scanning and Sniffing Equipment. Plugable Bluetooth adapter (left), Alfa

AWUS036ACH Wi-Fi adapter (center), and Ubertooth One BLE sniffer (right) [29]–

[31] ............................................................................................................................. 43

21. Sniffer Layout ........................................................................................................... 45

22. Commands to prepare Alfa card for data collection ................................................. 46

23. Commands used to scan for Wi-Fi AP ..................................................................... 47

24. Command used to scan for Wi-Fi devices associated to the AP............................... 48

25. Commands used to scan for BLE devices................................................................. 49

26. One-Hot Encoding .................................................................................................... 52

27. BLE Packet Length ................................................................................................... 61

28. BLE Link Layer Header Length ............................................................................... 62

29. BLE Associated Packet Count .................................................................................. 63

30. BLE RF Channels ..................................................................................................... 63

31. BLE PDU Types ....................................................................................................... 64

32. Wi-Fi Packet Length ................................................................................................. 66

33. Wi-Fi Vendors .......................................................................................................... 66

34. Wi-Fi Associated Packet Count ................................................................................ 67

35. Wi-Fi Packet Subtype ............................................................................................... 68

36. System under test diagram ........................................................................................ 70

37. A 𝒌𝒌 𝒙𝒙 𝒌𝒌 Confusion Matrix (𝒌𝒌 = 𝟑𝟑) .......................................................................... 72

38. Motion Source Appearance (left) and Location in SHE (right) ................................ 78

39. Confusion Matrices from BLE Full-Featured Classification .................................... 84

ix

40. Confusion Matrices from BLE Door Sensors vs Temperature Sensors ................... 90

41. Confusion Matrices from Wi-Fi Full-Featured Classification .................................. 92

42. Confusion Matrices from Wi-Fi Cameras vs Plugs ................................................ 100

43. Confusion Matrices from Wi-Fi Classification (No Vendor Features) .................. 100

44. BLE Door Sensors vs. Temperature Sensors Features ........................................... 130

45. Wi-Fi Cameras vs. Plugs Features .......................................................................... 131

x

List of Tables

Table Page

1. Advertising PDU Types .............................................................................................. 14

2. Wi-Fi Devices ............................................................................................................. 35

3. BLE Devices ............................................................................................................... 36

4. Device Actions ............................................................................................................ 37

5. Wi-Fi Vendor List ....................................................................................................... 51

6. Wi-Fi Dataframe Columns .......................................................................................... 53

7. Wi-Fi Device Set Assignment .................................................................................... 54

8. BLE Device Set Assignments ..................................................................................... 56

9. BLE Dataframe Columns ............................................................................................ 57

10. Hyperparameters used in Grid Search ...................................................................... 59

11. Best-Performing Hyperparameters ........................................................................... 60

12. Response Variables ................................................................................................... 73

13. Performance Metrics ................................................................................................. 76

14. Classifier Performance in BLE Full-Featured Classification ................................... 83

15. Device Type Precision in BLE Full-Featured Classification .................................... 85

16. Classifier Recall in BLE Full-Featured Classification.............................................. 85

17. Feature Importance in BLE Full-Featured Classification ......................................... 87

18. Classifier Performance in BLE Best-3 Feature Classification .................................. 88

19. Classifier Performance in BLE Door Sensors vs Temperature Sensors ................... 89

20. Classifier Performance in Wi-Fi Full-Featured Classification ................................. 91

xi

21. Device Type Precision in Wi-Fi Full-Featured Classification .................................. 93

22. Device Type Recall in Wi-Fi Full-Featured Classification ...................................... 93

23. Feature Importance in Wi-Fi Full-Featured Classification ....................................... 94

24. Classifier Performance in Wi-Fi Best-3 Feature Classification ................................ 95

25. Classifier Performance in Wi-Fi Cameras vs Plugs (Full-Featured Classification) . 97

26. Feature Importance in Wi-Fi Cameras vs Plugs (Full-Featured Classification) ....... 97

27. Classifier Performance in Wi-Fi Cameras vs Plugs (Best Features Classification) . 98

28. Classifier Performance in Wi-Fi Classification (No Vendor Features) .................... 99

29. Feature Importance in Wi-Fi Classification (No Vendor Features) .......................... 99

30. High and Low Performance BLE Classifiers .......................................................... 101

31. High and Low Performance Wi-Fi Classifiers ........................................................ 102

1

EVALUATING MACHINE LEARNING TECHNIQUES

FOR SMART HOME DEVICE CLASSIFICATION

I. Introduction

1.1 Background

Smart devices are increasingly being used in consumer and industrial applications.

Once connected to the Internet, these smart devices allow for remote sensing and control

of a wide variety of processes. The Internet of Things (IoT) is expected to have a network

of over 31 billion devices by 2020 [3]. IoT devices have the potential to affect personal

and commercial spaces, and therefore need to be studied for cybersecurity implications.

IoT device design often focuses on minimizing power and cost [4]. Such design decisions

can result in deficient security that cause information leakage. IoT devices regularly

perform automatic functions upon a subject’s arrival or departure. Traffic from these

devices can be analyzed to figure out a subject’s pattern-of-life [2]. By learning which

type of devices are activating upon a subject’s presence, a malicious actor can gain

exploitable information. Therefore, it is important to study whether such IoT device

classification is possible.

1.2 Problem Statement

Recent research has shown that it is possible to model a subject’s pattern-of-life

through data leakage from Bluetooth Low Energy (BLE) and Wi-Fi smart home devices

[2]. BLE and Wi-Fi are two widely-used protocols in IoT devices that can leak sensitive

2

information wirelessly, available for malicious attackers to collect and analyze without user

awareness. A critical step in establishing pattern-of-life is the identification of the device

types within the smart home. Device type is defined as the functional purpose of the IoT

device, and extends across a broad spectrum including cameras, electrical plugs, light

bulbs, door locks, temperature sensors, and motion sensors. Previous techniques in IoT

device classification have been limited to manual packet analysis, a deliberate process that

requires specific knowledge of target devices. This research seeks to leverage machine

learning algorithms to produce a generalized and scalable method of IoT device

classification. The problem statement this work answers is whether machine learning can

be applied to successfully classify devices into their respective device types using collected

wireless traffic.

1.3 Hypothesis and Research Goals

This work hypothesizes that if machine learning classifiers are trained using

wireless traffic from a realistic smart home environment, then the classifiers can

successfully identify the device type of IoT devices to a high degree of accuracy.

The goals that guide this research are:

1. Design and build a source of realistic smart home device traffic.

2. Develop procedures to collect and prepare the wireless traffic for machine

learning classification.

3. Evaluate the performance of the linear discriminant analysis (LDA), k-

nearest neighbors (KNN), and random forests (RF) machine learning

classification algorithms in determining IoT device types.

3

4. Determine which features are most useful for classification purposes.

5. Assess the suitability of machine learning in IoT device type classification.

1.4 Approach

A smart home environment composed of commercially-available BLE and Wi-Fi

devices is assembled to produce authentic wireless traffic. The wireless traffic is collected

and preprocessed into a dataset suitable for machine learning. Classifiers are tuned and

trained. The models are tested on a number of classification tasks to evaluate their

performance. Results are synthesized to consider algorithm performance and device

security implications. The machine learning approach uses a multiclass classification

scheme, with three device types per wireless protocol used as response classes. K-nearest

neighbors, random forests, and linear discriminant analysis are the classification algorithms

used.

1.5 Assumptions/Limitations

The following assumptions and limitations are recognized throughout this

experiment:

• The devices selected in the smart home environment are representative of

an authentic smart home.

• All devices are compatible with an Apple iPhone.

• All algorithms are accurately implemented by third-party libraries.

1.6 Contributions

This research adds to the fields of IoT security and machine learning classification

through two primary contributions:

4

1. Smart Home Environment (SHE): A smart home architecture using BLE

and Wi-Fi smart devices is designed to provide realistic wireless traffic that

can be used for analysis and classification.

2. Device Classification Pipeline (DCP): A system of machine learning

techniques is applied to collect, process, and analyze the wireless traffic

produced by the smart home devices.

1.7 Thesis Overview

This thesis is organized into six chapters. Chapter 2 presents an overview of

relevant wireless protocols, machine learning techniques, classification algorithms, and

other related research. Chapter 3 provides the design details of the SHE and DCP systems

used to create, capture, prepare, and analyze the wireless traffic used in the experiment.

Chapter 4 discusses the experiment methodology, while Chapter 5 presents the analysis of

results. Lastly, Chapter 6 provides a summary of the work and considers possible avenues

for future work in this research area.

5

II. Background and Related Research

2.1 Overview

This chapter presents a technical review of the wireless protocols Wi-Fi and BLE

in Sections 2.2 and 2.3 respectively to describe what features of their architecture and

packet structure are applied in machine learning classification. Section 2.4 follows with a

brief description of machine learning, and Section 2.5 provides a summary of traffic

analysis research, the current state of IoT device classification, and a discussion of related

research. Lastly, Section 2.6 offers a list of common terminology used throughout this

research.

2.2 Wi-Fi

By far, the most commonly used technology for wireless local area networks

(WLANs) is defined by the IEEE 802.11 standard, also known as Wi-Fi [1]. IEEE 802.11,

hereafter referred to as 802.11, defines the medium access control (MAC) and Physical

Layers (PHY). In wireless networks, a station (STA) is the addressable unit, and the basic

service set (BSS) is the fundamental building block of a WLAN. The BSS is the effective

area within which member STAs of the BSS can continue communication. In

infrastructure mode, WLAN topology is centered on an access point (AP) that connects

STAs from the WLAN to the wired network. A service set identifier (SSID) serves as the

primary name associated with a WLAN and is typically used by STAs to find WLANs.

Association is the process through which a STA connects to an AP. 802.11 expects

the AP to periodically send out beacon frames, containing the AP’s SSID and media access

control (MAC) address. The STA seeks out these beacon frames by continuously scanning

6

the wireless channels defined in 802.11. Once an AP has been selected, the STA sends an

association request frame to the AP, and the AP responds with an association response

frame. After this process, the AP typically assigns an IP address to the STA through a

Dynamic Host Configuration Protocol (DHCP) exchange. Once completed, the STA has

joined the AP’s subnet and is viewed as simply another device in that subnet.

The general frame used to transmit data in 802.11 is illustrated in Figure 1. The

frame consists of various fields: frame control, duration/identification, address fields,

sequence control, frame body, and frame check sequence (FCS). The frame control field

contains the protocol version, frame type and subtype, and other control information. The

duration/identification field specifies the transmission time required for the frame. The

four address fields include the destination address, source address, receiver address, and,

occasionally, the transmitter address. The sequence control field helps identify duplicate

frames. The frame body, also known as the Data field, moves the higher-layer payload

between stations. The FCS is a checksum appended to the frame to detect corruption. If

the receiver calculates a different FCS than the FCS included in the frame, the frame is

deemed corrupted and is discarded.

Figure 1. Wi-Fi Frame Fields [5]

2.3 Bluetooth Low Energy

Bluetooth Low Energy (BLE) is a separate technology from classic Bluetooth, with

different design goals [6]. BLE, sometimes referred to as Bluetooth 4.0, was first

7

introduced in 2010 by the Bluetooth Core Specification 4.0 [7]. While classic Bluetooth

focuses on high data rates, BLE has been optimized for ultra-low power applications.

Bluetooth Low Energy is not trying to improve on Bluetooth classic; instead, it targets new

applications that have not previously used open wireless standards. These applications are

those that require devices to send minimal octets of data from once a second to once every

few days. By design, BLE is intended to minimize not only overall activity, but even the

time required to do anything useful. If a device is operating, even if it is nothing more than

checking whether it needs to send or receive something, it is using energy.

Certain key elements support low cost, including its industrial, scientific, and

medical (ISM) band, intellectual property license and low power. The 2.4 GHz ISM band

may have poor propagation, but is available worldwide with no license requirements. The

Bluetooth Special Interest Group (SIG) only requires a very low cost intellectual property

license. Finally, the best way to design a low-cost device is to reduce required materials

such as batteries. BLE was designed to work with the smallest, cheapest, and most readily

available battery option – button-cell batteries.

The BLE architecture is split into three parts: controller, host, and applications, as

shown in Figure 2. The controller is a physical device that transmits and receives radio

signals, and can convert these signals into packets with information. Within the controller

are the physical and link layers, as well as the lower half of the Host Controller Interface.

The controller can be identified as the Bluetooth chip or radio. The controller

communicates to other devices using an antenna, and to the host using the Host Controller

Interface.

8

The host is a software stack that directs how multiple devices communicate with

one another, typically managing several services at the same time. The host controls the

Logical Link Control and Adaptation Protocol (L2CAP), the Security Manager Protocol,

Attribute Protocol, Generic Attribute Profile (GATT), and Generic Access Profile (GAP).

The L2CAP handles the passing of data between host and the controller through channels.

BLE uses three fixed channels: one each for connection management data, the Security

Manager, and the Attribute Protocol. The Security Manager Protocol handles device

pairing and key distribution. The Attribute Protocol defines the rules for accessing data by

another device through the use of attributes. The GATT resides above the Attributes

Protocol and defines the types of attributes and how they can be used. More detail on how

attributes work is included in Section 2.3.5. Lastly, the GAP controls how devices

discover, connect, and provide information to users in the application layer. It also outlines

the procedures needed to discover, connect, and pair with other devices by controlling the

link layer states. Finally, applications use the BLE architecture to provide various

functions to users.

9

Figure 2. BLE Architecture [6]

The physical layer of the controller transmits and receives bits using the 2.4 GHz

band. The frequency of the radio waves use a modulation scheme called Gaussian

Frequency Shift Keying (GFSK) that shifts the frequency slight up and down over a

Gaussian filter. Compared to classic Bluetooth’s 79 1-MHz channels, BLE is split into 40

separate channels, with a 2 MHz separation between one another, as shown in Figure 3.

Figure 3 also shows the advertising channels, indicated by the darkened channels. When

transmitting data, BLE transmits at the rate of 1 million bits per second (Mbps), with a

maximum transmit power of 10 mW.

10

Figure 3. BLE Channel Map [6]

2.3.1 BLE Link-Layer States

The link layer describes packet details, advertising, and data channels. It also

describes how device discovery, data broadcasting, and connections operate. As shown in

Figure 4, the link layer defines five states: Standby, Advertising, Scanning, Initiating, and

Connected.

Figure 4. BLE States [6]

11

Upon powering on, devices start in the standby state and remain there until the host

layers instruct them otherwise. Devices in the standby state can move into all other states.

Once fully powered, the advertising state can be initiated by the application through the

GAP. In the advertising state, the link layer can transmit advertising packets or respond to

scan requests. Devices that want to be discoverable or connectable must be in the

advertising state. Devices in the advertising state can only move to the connected or

standby state.

The scanning state allows a device to receive advertising channel packets from

other devices in the local area. There are two types of scanning: passive scanning and

active scanning. Passive scanning only receives advertising packets; the device never

transmits anything. In active scanning, the device additionally sends scan requests to all

advertising devices. The advertising device then replies with a scan response. Both the

scan requests and response packets are sent on the advertising channel.

To initiate a connection between devices, the link layer must go through the

initiating state. In this state, the initiating device listens for advertising packets from the

device with which it is trying to connect. Once an advertising packet is received, the link

layer sends a connect request to the advertising device and moves into the connected state.

The last state of the link layer is the connected state. The connected state can only

be entered through the advertising or initiating states. It is only in the connected state that

data channel packets are sent and received. There are two substates: master or slave. Only

the device that initiates the connection can become the master. A master device must

regularly send packets to the slave device. The slave substate can only be entered from the

12

advertising state. The device that becomes the slave must have been advertising to another

device. A slave device can only transmit in response to the master device. Devices cannot

be both master and slave simultaneously, nor can a device be a slave of two masters at the

same time.

2.3.2 BLE Packet Structure

The packet is the standard data block of the link layer. There are two types of

packets: advertising and data packets. Advertising packets are used to find and connect to

other devices, while data packets are used once a connection is established. The packet

type is determined by the channel on which the packet is transmitted. If a packet is

transmitted on one of the three advertising channels, then it is an advertising packet; if it is

transmitted on any of the 37 data channels, it is a data packet.

Link-layer packets follow the structure as displayed in Figure 5. These are divided

into the preamble, access address, header, length, data and cyclic redundancy check (CRC)

fields.

Figure 5. BLE Link-Layer Packet Structure [6]

The preamble, or the first 8 bits of a packet, is always either a 01010101 or

10101010 sequence, randomly selected. These simple sequences allow the radio to adjust

13

gain and determine the frequencies used for zero and one bits. The access address is the

next 32 bits and can be one of two types based on packet type: advertising access address

or data access address. Advertising access addresses are set at a fixed value

(0x8E89BED6) to help standardize the advertising process. Data channels use a different

random access address on each and every connection, which is used when data must be

reliably delivered to another device.

The header field varies based on the packet type. For advertising packets, the

header contains the advertising protocol data unit (PDU) type. Table 1 provides a summary

for each PDU type. ADV_IND indicates that the device is advertising as connectable

(available to create a connection) and undirected (not looking to connect to a specific

device); this is the advertising packet type most commonly used. ADV_DIRECT_IND

indicates that the device is connected and directed (looking for a specific device with which

to connect). ADV_NONCONN_IND indicates that the device is nonconnectable (refuses

to connect) and undirected; this is used by devices seeking to only broadcast data.

ADV_SCAN_IND, SCAN_REQ, and SCAN_RSP are used during active scanning.

ADV_SCAN_IND indicates that the advertising device is open to active scanning,

SCAN_REQ is a request made by the initiating device to receive a scan response, and

SCAN_RSP is the scan response itself. Lastly, the CONNECT_REQ header type is sent

by an initiating device to an advertising device when the initiating device wants to create a

connection. CONNECT_REQ packets contain information needed to establish a

connection.

14

Table 1. Advertising PDU Types

PDU Type Purpose 1 ADV_IND General advertising indication 2 ADV_DIRECT_IND Direct connection indication 3 ADV_NONCONN_IND Nonconnectable advertising indication 4 ADV_SCAN_IND Scannable indication 5 SCAN_REQ Active scanning request 6 SCAN_RSP Active scanning response 7 CONNECT_REQ Connection request

Data packets have headers containing the logical link identifier (LLID), sequence

number (SN), next expected sequence number (NESN), and more data, as shown in Figure

6. The LLID is used by the link layer to manage the channel for this connection. The one-

bit sequence number for each new data packet toggles from the previous data packet’s

sequence number, with the first data packet in a connection having a sequence number of

zero. The SN allows the receiving device to determine whether the received packet is a

retransmission of a previous packet or a new packet. The NESN allows for

acknowledgement of data packets. The last bit in the data packet header is the more data

bit, where 1 signals that there is more data to transmit, and 0 signals the end of the data

transmission.

15

Figure 6. BLE Data Packet

The length field reports the size of the packet, with a range of valid values from 6

to 37 bytes for advertising packets, and 0 to 31 bytes for data packets. The payload is the

actual data that is being transmitted for use by the application. The final part of the packet

is a 3-byte CRC. The CRC is calcuated using the header, length and payload fields, and

serves to detect accidental changes to raw data.

2.3.3 Creating Connections

A connection is required to reliably allow for two-way data transfer. Figure 7

shows how a connection is typically created. The first step is for one device to advertise

using an advertising packet (commonly with ADV_IND) and for another device to initiate

a connection to the advertising device with a CONNECT_REQ packet. Using the

information in the CONNECT_REQ packet, a connection is created between the two

16

devices, with the initiating device now the master device and the advertising device as the

slave device.

Figure 7. BLE Connection and Data Sending Diagram

All necessary information is contained within this CONNECT_REQ packet,

including access address, connection interval, and channel map. The access address is

randomly determined by the master. If a master has multiple slaves, it chooses a different

access address for each slave. When in a connection, the master must transmit a packet to

the slave once every connection event. The connection interval determines how frequently

this happens; the connection interval can be any period between 7.5 milliseconds to 4

seconds. Lastly, the channel map is a bit mask of the data channels the connection uses,

where if the bit is set to one, then the channel is deemed a good channel and can be used

17

for data traffic, and if the bit is set to zero, the channel is deemed a bad channel, and is

never used for data traffic.

The Generic Access Profile (GAP) defines the discovery and connection process

between devices. The GAP provides two types of discoverability: limited and general.

Limited-discoverable mode is used by devices that have just been made discoverable, and

are meant to stand out from general-discoverable devices. As such, devices are not allowed

to remain in the limited-discoverable mode for more than 30 seconds. The general-

discoverable mode is used by devices that are discoverable but have been inactive for a

period of time. This becomes the default mode for devices once they exceed the 30 seconds

allowed for limited-discoverable mode.

2.3.4 Sending Data

Once in a connection, devices can send data to each other using data packets. Data

packets have four fields in their header: logical link identifier, sequence number, next

expected sequence number and more data. The logical link identifer (LLID) determines

what kind of data the packet contains. The LLID can indicate that the packet is a link layer

control packet, which is used by the link layer to manage connections. Otherwise, it is a

data packet intended for the host, and can either be a start packet or continuation packet.

Start packets signal the beginning of a series of data packets, and continuation packets

make up the rest of the transmission. Interestingly, because the link layer does not need to

know the entire length of the data, continuation packets can be continuously sent. Data

packets have a single bit for the sequence number, beginning with zero for the first data

packet. It then alternates between one and zero for each new data packet. To acknowledge

18

a data packet, the next expected sequence number (NESN) is used. If the data packet

received by a device has a sequence number of one, then the NESN is zero; otherwise, the

data packet would be retransmitted. Lastly, the MD bit indicates that the transmitting

device has more data ready to send. If one, then the receiving device maintains the

connection. If zero, then the two devices can close the connection to save power.

Figure 8 provides an example of how connection events occur between two

connected devices. A connection event is the start of a set of data packets sent from the

master to the slave and back again. Connection events are always initiated by the master

device. The master device initiates the transmission by sending a data packet with SN

zero, NESN zero, and MD one. The slave device receives this packet and attempts to send

its own packet, with SN zero, NESN one (acknowledging that the previous packet) and

MD one. However this packet was not properly received by the master device. Without an

acknowledgment from the slave device, the master device retransmits its first packet. The

slave device detects that retransmission of the previous packet is required, and does so.

This time, the packet is properly received. The master device no longer needs more data

from the slave device and sends a packet with the MD bit zero. The slave device

acknowledges this by sending its own packet with MD zero, and the connection event

between the two devices end. A second connection event is initiated by the master, but the

MD bit is set to zero. This type of connection event is typically performed to check on the

slave’s status, serving as a “ping”. The slave receives the packet, and seeing that MD is

zero, acknowledges the previous packet and ends the connection event.

19

Figure 8. Data Transmission [6]

2.3.5 Attributes

The Attribute Protocol, shown in Figure 2, is central to understanding Bluetooth

Low Energy. BLE is designed as a client-server architecture, where a server is a device

that has data, and a client is any device that is using data from another device. Figure 9

shows how the client-server architecture works. In practice, the master device acts as the

client requesting data from its slave devices who act as servers.

20

Figure 9. GATT Client-Server Interaction [8]

Attributes are the fundamental structure through which BLE achieves the client-

server architecture. The Generic Attribute Profile (GAP) is a set of rules that define how

to present, group, and transfer data using BLE. The GAP defines attributes as a piece of

labeled, addressable data, and each attribute has three parts: a handle, a type, and a value.

The attribute handle is the attribute’s 16-bit address. The attribute type is comparable to a

data type in programming languages, and is used to identify the nature of the attribute’s

information (e.g., temperature, pressure, time, etc.). Lastly, the attribute value is the actual

value and has a size between 0 to 512 bytes. Attributes are stored in an attribute database,

which is in turn contained within an attribute server. Clients communicate with the

attribute server to obtain desired information. There can only be one attribute server per

device, and every device must have both an attribute server and an attribute database.

Permissions must be set for every attribute in an attribute database, and these

permissions come in three categories: access, authentication, and authorization. Access

permissions must be set to readable, writable, or readable-writable. Authentication and

authorization permissions are not required and can be left open. The difference between

the two permission types is that authentication occurs at the client level, while authorization

occurs at the server level. It is important to note that these permissions only relate to the

21

attribute value; any device has permission to view the attribute handles and types on a given

device.

The Attribute Protocol is the protocol through which clients find and access

attributes on an attribute server. It is a simple protocol with only six basic operations:

Request, Response, Command, Indication, Confirmation, and Notification.

A Request is sent by the client when the client wants the server to do an action and

send back a Response. A client can only send one Request at a time, and must wait for a

Response before sending another Request. A Command is similar to a Request, except no

Response is needed. Indications are used by the server to inform a client about an update

on a given attribute’s value, and require a Confirmation from the client. Notifications are

similar to Indications, except they need no Confirmation. Since Commands and

Notifications do not require Responses nor Confirmations, they can be sent without any

restrictions. If the receiving device cannot handle all the messages, the messages may be

dropped. Therefore, Commands and Notifications are unreliable, while Requests and

Indications are considered reliable.

Protocol messages are combinations of these basic operations used to perform

common tasks using the Attribute protocol. Their role is comparable to library functions

in programming languages. Most messages consist of both a Request and a Response

operation. For example, the Read Request message uses a Request operation that has a

handle of a desired attribute, and the Response returns the attribute value. Protocol

messages have a wide range of functionality that enable efficient reading, writing, error

handling, and notifications.

22

For connected devices, the Generic Attribute profile (GATT) defines two basic

forms of grouping: characteristics and services. Figure 10 shows how the GATT structure

organizes characteristics and services. Characteristics are defined attribute types that can

only contain certain logical values. The BLE Specification provides over 200 predefined

characteristics such as Alert Status, Language, Battery Level, and Time Zone that can only

take on specific values based on their characteristic definition. A service is a collection of

characteristics and relationships with other services to perform a given function [9].

Sometimes refered to as profiles, services expose certain device information and

functionality in a standardized manner.

Predefined services include the Battery, Environmental Sensing, and Heart Rate

services [7]. For example, consider a personal fitness monitoring device that uses the

Battery, Environmental Sensing, and Heart Rate services. A user may connect the fitness

monitor to a smartphone through BLE. Through the Battery service, the user can monitor

the battery life of the device, ensuring that the device does not run out of power during

workout sessions. Through the Environmental Sensing service, the user can monitor

measurement data from the device’s various sensors, such as air temperature, humidity and

elevation. Finally, through the Heart Rate service, the user can track one’s heart rate

throughout the workout session. While predefined services accommodate common needs,

custom services can also be created to suit developers’ needs.

23

Figure 10. GATT Structure [8]

2.3.6 Security

The Security Manager (SM), shown in Figure 2, serves two fundamental functions:

device pairing and message authentication. These two functions enable the rest of BLE’s

security features. Pairing allows two unfamiliar devices to authenticate each other’s

identity in preparation for activity that requires security. During pairing, each device first

determines each other’s input and output capabilities (e.g., no input-no output, display only,

display yes/no, keyboard only, keyboard display) to determine what level of authentication

is possible. For example, if two devices are both display only, they would not be able to

authenticate via passkey entry, and the SM would default to simply letting the devices pair

automatically without authentication. After determining input and outputs, the SM then

proceeds to authenticate each other using a randomly-generated key, if possible. This is

often implemented by the user typing in a six-digit key. Lastly, keys are distributed

between the devices. Message authentication uses the CMAC (Cipher-based Message

24

Authentication Code) algorithm, with the keys distributed using pairing. An additional

SignCounter value is used to prevent replay attacks.

2.4 Machine Learning

Coined by Arthur Samuel, the term machine learning (ML) refers to a field of

computer science that applies statistical techniques to give computers the capability to learn

from data without explicit programming [10]. ML plays a significant role in the fields of

statistics, data mining, and artificial intelligence. Machine learning is increasingly being

applied today to tasks too complex for traditional approaches or have no known algorithm

[11].

To illustrate, consider the task of distinguishing between different types of fruit.

Traditional approaches would need to first study the problem (for example, what is the

difference between an apple and an avocado) then write rules to solve the problem (apples

are red, avocados are green). However, this problem is more complex (some apples are

also green). Therefore, additional rules must be included to further refine the solution to

the program (apples have smooth skin, avocados have pebbled skin). But as more test

cases and situations are added, this set of rules grows significantly (watermelons are green

but have smooth skin), making the maintenance of these rules very difficult for a human

programmer.

A typical ML scenario seeks to predict an outcome measurement, usually

categorical (e.g., type of fruit) or quantitative (e.g., future house prices), using a set of

features (e.g., fruit color, house location) from a dataset [12]. A set of training data is

25

used to observe the outcome and feature measurements. With this data, a prediction model

is built to predict the outcome for new cases, or observations.

Given a sufficiently large dataset, machine learning excels in the type of problem

presented earlier. ML applies statistical techniques to reveal patterns within data. By

analyzing the data, an ML approach can develop a model using the patterns in the data, and

then produce a solution. Furthermore, the solution can reveal certain insights about the

data that may have been missed.

There are two broad types of machine learning: supervised learning and

unsupervised learning. Supervised learning requires the presence of the outcome

measurement, or labels, to direct the learning process. A typical supervised learning task

is classification, the task of assigning to which set of categories a given observation

belongs. In the fruit classification problem, the fruit type would be an example of a label,

and the task is assigning a fruit type to a given fruit.

As mentioned, classification is considered supervised learning because it requires

the presence of labels. Another supervised learning task is regression. Regression is the

task of predicting a numerical value for a given observation. An example is predicting

future house prices, given the features of house location and age. Because of the use of

labels, supervised learning methods can be evaluated on their performance. Chapter 3

provides the details on performance evaluation.

Unsupervised learning uses unlabeled data. Unsupervised learning aims to infer

the structure of data. A common task for unsupervised learning is clustering. The goal of

clustering is to detect groups of similar observations within the data. These groups may

26

have not been clearly evident, and the detected groups may be used in applications such as

pattern recognition, compression, and graphics. An example of a clustering problem would

be when a website like Amazon or Netflix provides recommendations based on users’

browsing history [13]. By clustering items that are similar to those users have previously

looked at, online retailers or streaming services can suggest certain products or movies.

2.4.1 Classification Algorithms

The algorithms used in this experiment are among the most commonly used

machine learning classification algorithms. All three algorithms are identified as

supervised learning algorithms. Supervised learning requires the outcome measurement

to direct the learning process. In classification, the objective is to assign a given

observation to a particular class or label using a set of inputs or predictors. Because of

their use of outcome measurements, supervised learning algorithms may be evaluated on

their performance.

2.4.1.1 K-Nearest Neighbors

K-Nearest Neighbors (KNN) classifies an observation by finding the observation’s

k nearest neighbors, where k is an integer, and classifying the observation to the class with

the highest estimated probability [14]. A commonly used distance metric is Euclidean

distance, however other distance metrics can be used, such as Manhattan distance or

Chebyshev distance. The hyperparameter k determines the number of neighbors KNN

considers in its classification of an observation. Supposing 𝑘𝑘 = 3, KNN considers the

three closest neighbors of 𝑥𝑥0. The observation is assigned the majority class of these 𝑘𝑘

nearest neighbors. KNN does not make assumptions about how the data is distributed [15].

27

While this approach works well for data with unknown distributions, it leads to a higher

susceptibility to local anomalies within the data. Additionally, if there are many

dimensions in the data, several inputs may be “nearest” to the observation, leading to

reduced effectiveness.

2.4.1.2 Linear Discriminant Analysis

Linear Discriminant Analysis (LDA) is a linear transformation technique first

proposed by Ronald Fisher in 1936. LDA models the distribution of the predictors

independently in each of the response classes, then applies Bayes’ theorem to find an

estimate of the posterior probability. A Gaussian or normal distribution is typically used

to model the distribution of the predictors. LDA assigns the observation to the response

class with the highest probability. When the assumption about the predictors’ distribution

does not hold, performance is reduced.

2.4.1.3 Random Forests

A random forest is an ensemble of decision tree-based algorithms [14]. Decision

trees divide the feature space into different regions that can then be used to classify

observations. By using a multitude (or ensemble) of decision trees, random forests assign

an observation to the most commonly occurring class in the region to which it belongs. As

with a decision tree, there is a danger of overfitting when applying random forests.

2.5 Related Work

Wired network traffic analysis has been extensively studied. Preliminary methods

targeted port number and payload content analysis; these resulted in considerable success

at the time [16]. However, these approaches have challenges that limit their effectiveness.

28

Port number analysis is accurate only if networks adhere to port standards, and payload

analysis cannot cope with encrypted transmissions.

Transport-layer analysis has been applied to address the limitations of the previous

methods [17]. In peer-to-peer (P2P) networks where arbitrary ports are most commonly

used, transport-layer analysis allows the profiling of IP connection patterns. The

observation of source-destination IP pairs and IP address-port pairs offered a method of

studying P2P traffic without any examination of user payload. However, direct transport-

layer analysis can be time-consuming, and analytical scripts are restricted by programmer

knowledge.

Machine learning has emerged as a promising technique for traffic analysis [18].

Studies can be broadly categorized into unsupervised and supervised approaches. One of

the earliest studies using unsupervised learning, McGregor et al. applied clustering

techniques to group wired traffic flow between six common network protocols and found

that the data rate was a key feature in this effort [19]. Bernaille et al. used a variation of

the k-Means algorithm to classify Transmission Control Protocol (TCP) traffic to the

application type (e.g., file transfer protocol (FTP), hypertext transfer protocol (HTTP),

secure shell (SSH), etc.) using the packets at the start of traffic flow [20]. This approach

focused on the packets used during the TCP handshake and was able to classify traffic flow

with over 80% accuracy. However, this work assumed that the handshake can always be

captured at the onset of traffic flow, which is not always achievable. It was with Erman et

al. that web traffic was analyzed using the k-Means approach [21]. Instead of full bi-

directional traffic flow, the work focused on uni-directional flows between server-to-client

29

and client-to-server. Their results showed that server-to-client datasets produced the

highest accuracy (95%), and that flow duration, number of bytes and number of packets

were the features that provided the most value in classifying packets.

Supervised learning introduced algorithms such as the k-Nearest Neighbors (kNN),

linear discriminant analysis (LDA), and quadratic discriminant analysis (QDA) to traffic

analysis. A study by Roughan et al. used these techniques to classify different network

applications to specific traffic classes [22]. The considered features were analyzed at the

packet level, flow level, and connection level. Packet level features were derived from

individual packets such as the size of the packet and the time the packet was sent. Flow

level features were derived from sequences of packets that shared common field values,

such as source IP address, destination IP address, and protocol type. Connection level

features were derived from transport-layer protocol information such as those found in TCP

connections. Of these features, average packet length and flow duration were found to be

the most valuable. Moore and Zuev were able to further improve on traffic classification

accuracy by using Naïve Bayes [23]. The dataset was manually classified beforehand, and

over 240 features were used to train the classifier. Using Naïve Bayes alone, the study

achieved approximately 65% accuracy. The classifier was then refined to reduce the

number of features, and accuracy was improved to over 95%. Auld et al. extended this

work by applying a Bayesian neural network (NN), further improving classification

accuracy [24]. The training NN achieved a classification accuracy up to 99% for data

trained and tested on the same day, and 95% for data trained and tested with an eight-month

gap. The features were ranked for value, and the top three features were found to be: count

30

of TCP PUSH packets, total number of bytes in the initial window (TCP handshake) from

client to server, and total number of bytes in the initial window (TCP handshake) from

server to client.

Wireless traffic adds a layer of complexity in traffic analysis. The most ubiquitous

wireless network standard, IEEE 802.11 (commonly known as Wi-Fi), uses encryption in

its wireless transmissions to protect against external eavesdropping. Nevertheless,

Atkinson showed that Wi-Fi can still leak private user information using only side-channel

information similar to those exploited in wired networks [25]. A Random Forests classifier

was trained on a dataset of Skype activity with around 60,000 observations and 600

features, and achieved around 97% accuracy. Furthermore, it was shown that a

classification accuracy of greater than 95% could be accomplished using only 200 variables

and 20 trees. The most valuable features were discovered to be the amount of time between

Sent frames, and the amount of time between a Received frame and the previous Sent

frame.

As for Internet of Things (IoT) devices, there is a shortage of wireless traffic

analysis in the current literature. This is likely due to the relative novelty of the IoT. A

study by Copos et al. analyzed two IoT devices, the Nest thermostat and Nest Protect smoke

detector, and succeeded in determining the devices’ Home and Away modes with 88% and

67% accuracy respectively [26]. Beyer expanded on the use of wireless traffic analysis by

showing that pattern-of-life profiling is possible using sniffed BLE and Wi-Fi traffic [2].

By analyzing both incoming and outgoing packet flows of three device types (outlet,

sensor, and cameras), it was shown that 17 out of 18 IoT devices could be classified. Beyer

31

used frame size, measured in bytes, as a primary feature. Meidan et al. is the first study

found that applied machine learning to IoT device identification [27]. The goal of the study

was to determine whether a network device was a personal computer, a smartphone or an

IoT device. A dataset was produced using 802.11 wireless traffic from two PCs, two

smartphones, and ten IoT devices, with device types including baby monitors, refrigerators,

security cameras, thermostats, and smart outlets. A classifier was then trained using a

combination of gradient boosting and Random Forests techniques, and was able to classify

an IoT device with greater than 99% accuracy. A robust analysis on feature importance

was not performed. Wang et al. is the latest study to apply machine learning techniques to

traffic analysis of IoT devices [28]. A software-defined network (SDN) framework was

developed capable of efficient network quality-of-service management. This was achieved

through the use of deep-learning-based traffic analysis able to classify encrypted data

traffic to various applications (e.g., email, Skype video calls, Spotify music streaming,

etc.). Three deep-learning techniques were evaluated: multilayer perceptron, stacked

autoencoder, and convolutional neural networks.

2.6 Terminology

The following terms are frequently used throughout this thesis, and are defined here

for the purpose of clarity:

• Classification Algorithm/Classifier: a classification technique used to

distinguish between response classes. The classification algorithms used in

this thesis are limited to k-nearest neighbors (KNN), linear discriminant

analysis (LDA), and random forests (RF).

32

• Device Type: the broad functional purpose of an IoT device, e.g., cameras,

smart plugs, locks. The device type is the response class the classifiers in

this thesis are attempting to identify.

• Response Class: the output of supervised classification classifiers. The

response class referred to in this thesis is the device type.

• Test dataset: a subset of the dataset reserved to evaluate the classifier’s

performance. The test dataset is never used to train the classifier.

• Training dataset: a subset of the dataset used to train the classifier.

2.7 Background Summary

This chapter provides a concise summary on the BLE and Wi-Fi protocols that IoT

devices use. It explains the underlying architecture of these protocols and how their

internal structure apply to machine learning classification. It offers a brief overview of

machine learning. It gives a brief survey into the background research on wired traffic

analysis and how machine learning has been applied in those efforts. While extensive

research has been done in traditional wired traffic, there is a current shortage on IoT traffic

analysis. This thesis contributes to the areas of IoT security and machine learning by

showing how classification techniques may successfully identify IoT devices using

information leaked in conventional wireless traffic.

33

III. System Design

3.1 Overview

This research offers two contributions in analyzing the classification of IoT devices

using wireless traffic analysis: a Smart Home Environment (SHE) and a Device

Classification Pipeline (DCP). SHE is a system of actual IoT devices that produces

authentic smart home traffic by integrating various BLE and Wi-Fi commercial devices in

one physical space. To study the wireless traffic produced by SHE, DCP is used to collect

and preprocess the wireless traffic into workable data, extract features from the data, and

produce tuned classifier models ready for testing. Lastly, data exploration is done to gain

an initial understanding of the original dataset before any classification is performed. This

chapter presents a detailed explanation of SHE, each component of DCP, and the products

of data exploration on the datasets.

3.2 Smart Home Environment (SHE)

SHE is designed to produce actual wireless IoT traffic for analysis. SHE consists

of a controller device, and a variety of BLE and Wi-Fi devices. Wireless traffic produced

by SHE is collected using sniffing equipment.

3.2.1 Controller

An iPhone 6S, using iOS version 12.1, serves as the central controller for all devices

in SHE. The iPhone controls a given device using the manufacturer-developed application.

All applications can be obtained via the App Store. When the user is within range of SHE,

the iPhone connects to the Internet via the Wi-Fi router and gains access to the devices in

SHE. To interface with BLE devices, the iPhone’s Bluetooth must be turned on. Once

34

connected through Bluetooth, the user can control the BLE devices using the device’s

application.

3.2.2 Wi-Fi Devices

To provide connectivity to Wi-Fi devices, a 2.4 GHz AP is set up with an SSID of

“yosemite" and Wi-Fi Protected Access II (WPA2) security. The AP used is a Netgear

Nighthawk X4S R7800 router, as shown in Figure 11. Figure 12 provides the list of AP

settings.

Figure 11. Netgear Nighthawk X4S R7800 router used as AP

Figure 12. Wi-Fi Access Point Settings

35

All Wi-Fi devices, identified as 𝑤𝑤2 - 𝑤𝑤16, are connected to the AP. SHE contains

five cameras, seven smart plugs, and three light bulbs across four different manufacturers.

Table 2 shows the list of Wi-Fi devices used in SHE, including details such as device type,

device manufacturer, device name, and MAC address. Device MAC addresses are found

by checking the labels on the physical device. Appendix A provides complete device

details, including a device’s model number and serial number.

Table 2. Wi-Fi Devices

ID Manufacturer Device Type Device Name MAC Address 𝑤𝑤1 Netgear Router Yosemite 78:D2:94:4D:AB:3E 𝑤𝑤2 Belkin Camera Netcam1 EC:1A:59:E4:FD:41 𝑤𝑤3 Belkin Camera Netcam2 EC:1A:59:E4:FA:09 𝑤𝑤4 Belkin Camera Netcam3 EC:1A:59:E5:02:0D 𝑤𝑤5 Dropcam Camera Dropcam 30:8C:FB:3A:1A:AD 𝑤𝑤6 TPLink Camera Kasa AC:84:C6:97:7C:CC 𝑤𝑤7 Belkin Plug Insight 14:91:82:24:DD:34 𝑤𝑤8 Belkin Plug Mini 60:38:E0:EE:7C:E5 𝑤𝑤9 Belkin Plug Switch1 14:91:82:CD:DF:3D 𝑤𝑤10 Belkin Plug Switch2 B4:75:0E:0D:94:65 𝑤𝑤11 Belkin Plug Switch3 B4:75:0E:0D:33:D5 𝑤𝑤12 Belkin Plug Switch4 94:10:3E:2B:7A:55 𝑤𝑤13 TPLink Plug TpPlug 70:4F:57:F9:E1:B8 𝑤𝑤14 Lifx Light Bulb Lifx1 D0:73:D5:26:B8:4C 𝑤𝑤15 Lifx Light Bulb Lifx2 D0:73:D5:26:C9:27 𝑤𝑤16 TPLink Light Bulb TpBulb B0:4E:26:C5:2A:41

3.2.3 BLE Devices

To control BLE devices 𝑏𝑏1 - 𝑏𝑏11 , the iPhone is used as a Bluetooth master device.

SHE contains three locks, four door sensors, and four temperature sensors across five

different manufacturers. Table 3 shows the BLE devices used in the smart home

36

environment, including details such as device type, device manufacturer, and device name.

Appendix A provides complete device details, including a device’s model number and

serial number.

Table 3. BLE Devices

ID Manufacturer Device Type Device Name 𝑏𝑏1 August Lock August1 𝑏𝑏2 August Lock August2 𝑏𝑏3 Kwikset Lock Kevo 𝑏𝑏4 BLE Home Door Sensor Home1 𝑏𝑏5 BLE Home Door Sensor Home2 𝑏𝑏6 Eve Door Sensor Door1 𝑏𝑏7 Eve Door Sensor Door2 𝑏𝑏8 Eve Temperature Sensor Room1 𝑏𝑏9 Eve Temperature Sensor Room2 𝑏𝑏10 Eve Temperature Sensor Weather 𝑏𝑏11 SensorPush Temperature Sensor Push

3.2.4 Device Actions

To produce sufficient traffic volume for machine learning, devices are set to

perform actions expected in a smart home environment. Actions can be programmed or

triggered. Programmed actions are set up by the user to occur on a scheduled time and

interval, while triggered actions are performed by devices upon an event. Events can occur

at any time during the experimentation. To reduce variability, devices of the same device

type are programmed to perform similar actions at identical times or events. Table 4 lists

the complete set of actions used in SHE.

37

Table 4. Device Actions

Device Name Device Type Actions Schedule/Event 1 Dropcam Camera Send email notification Motion detected 2 Kasa Camera Send email notification Motion detected 3 Netcam1 Camera Send email notification Motion detected 4 Netcam2 Camera Send email notification Motion detected 5 Netcam3 Camera Send email notification Motion detected 6 Lifx1 Light Bulb Turn on, Turn off Hourly, on the hour 7 Lifx2 Light Bulb Turn on, Turn off Hourly, on the hour 8 TpBulb Light Bulb Turn on, Turn off Hourly, on the hour 9 TpPlug Plug Turn on, Turn off Hourly, on the hour 10 Insight Plug Turn on, Turn off Hourly, on the hour 11 Mini Plug Turn on, Turn off Hourly, on the hour 12 Switch1 Plug Turn on, Turn off Hourly, on the hour 13 Switch2 Plug Turn on, Turn off Hourly, on the hour 14 Switch3 Plug Turn on, Turn off Hourly, on the hour 15 Switch4 Plug Turn on, Turn off Hourly, on the hour 16 Door1 Door Sensor Report door state Continuous 17 Door2 Door Sensor Report door state Continuous 18 Home1 Door Sensor Report door state Continuous 19 Home2 Door Sensor Report door state Continuous 20 August1 Lock Report lock state Continuous 21 August2 Lock Report lock state Continuous 22 Kevo Lock Report lock state Continuous 23 Room1 Temp Sensor Report temperature Continuous 24 Room2 Temp Sensor Report temperature Continuous 25 Weather Temp Sensor Report temperature Continuous 26 Push Temp Sensor Report temperature Continuous

3.2.5 Device Location and Setup

All device locations are kept constant in SHE, except for the iPhone controller

which typically is on the user and is therefore not at a fixed location. Figure 13 shows the

location of each device in SHE. Wi-Fi devices are indicated by boxes with solid outlines,

and BLE devices are indicated by boxes with dashed outlines. The area containing SHE is

38

a one-bedroom apartment with three doors, one set of large windows, a large table, and a

couch; these items are indicated by grey boxes. Devices locations are placed near power

outlets to reduce the need for additional power cords.

Figure 13. SHE Device Locations (not to scale)

Figures 13 to 17 show how various devices are set up. Not all devices are shown

in the figures, but the overall configuration is similar across device types. Door sensors

are placed along the edges of doors, with their door sensor magnets located across them.

Light bulbs are installed into individual bulb sockets. Locks are installed into 3D-printed

39

door lock holders with actual door lock parts included. Plugs are installed into either power

strips or outlets, with no plugged-in device. Temperature sensors are placed in a flat

surface with open space around them. Cameras are placed around a motion source to

provide motion for their sensing capabilities. Figure 38 provides a photograph of the

camera device setup.

Figure 14. Door Sensor Setup

40

Figure 15. Light Bulb Setup

Figure 16. Lock Setup

41

Figure 17. Plug Setup

Figure 18. Temperature Sensor Setup

42

3.3 Device Classification Pipeline (DCP)

DCP is designed to collect and preprocess the wireless traffic into workable data,

extract features from the data, and produce tuned classifier models ready for testing. Figure

19 shows the DCP system diagram and its three components: (i) data collection, (ii) data

preprocessing, and (iii) model tuning. The outputs of DCP are tuned linear discriminant

analysis (LDA), k-nearest neighbors (KNN), and random forests (RF) classifiers for model

testing, as well as the test dataset. The following sections describe each component and

their functions.

Data collection is accomplished using user-inputted commands and scripts, as

described in Section 3.3.2, while data preprocessing and model tuning are accomplished

using the MulticlassDCP class written in Python. The MulticlassDCP class contains two

sub-classes, BLEMulticlassDCP and WifiMulticlassDCP, one for each protocol.

Figure 19. DCP System Diagram

43

3.3.1 Data Collection Hardware

A workstation is used to run all components of DCP. The workstation is an Acer

Aspire E15 with a 64-bit Intel Core i5-6200U 2.3 GHz processor, 8 GB DD4 RAM, 256

GB solid-state hard drive, and runs Kali Linux 2018.4 as the operating system. The

scanning and sniffing equipment consists of the Plugable Bluetooth adapter, three BLE

sniffers (Ubertooth One with firmware 2018-08-R1), and a long-range dual-band Wi-Fi

adapter (Alfa AWUS036ACH); all equipment is connected to the workstation using

Universal Serial Bus (USB). Each Ubertooth One sniffer uses a 2.4 GHz 2.2 dBi antenna,

while the Alfa card uses a 2.4 GHz and 5 GHz dual-band dipole antenna. Figure 20

provides images for these equipment.

Figure 20. Scanning and Sniffing Equipment. Plugable Bluetooth adapter (left),

Alfa AWUS036ACH Wi-Fi adapter (center), and Ubertooth One BLE sniffer (right)

[29]–[31]

44

3.3.1.1 Sniffer Distance Separation

All sniffers, Wi-Fi and BLE, operate in the 2.4 GHz band and must be horizontally

isolated to prevent interference. The required distance between antennae, 𝑑𝑑, to ensure

horizontal isolation is given by the Fraunhofer distance equation

𝒅𝒅 ≥ 𝟐𝟐𝑫𝑫𝟐𝟐

𝝀𝝀 (1)

where 𝐷𝐷 is the antenna length in meters and 𝜆𝜆 is the wavelength of the device frequency

band in Hz [32]. The Ubertooth One sniffers have 3.5 inch long antennae and operate at

an average wavelength of 2441 MHz, and the Alfa card has 6.5 inch long antennae and

operate at an average wavelength of 2412 MHz. Applying these values to (1) yields a

separation distance of about 5 inches for the Ubertooth One sniffers and 17 inches for the

Alfa card antenna. Separation distances are maintained by affixing the Ubertooth One

sniffers on a wooden board and ensuring that the Alfa card is located at an appropriate

distance, as shown in Figure 21.

3.3.2 Data Collection

Data collection supplies the raw data needed for data preprocessing. Data

collection occurs in two steps: scanning and sniffing. Scanning collects necessary

information required for sniffing and analysis, and sniffing gathers BLE and Wi-Fi wireless

traffic and stores them into packet capture files (pcap). The scanning and sniffing

procedures vary for BLE and Wi-Fi devices.

45

Figure 21. Sniffer Layout

3.3.2.1 Wi-Fi Devices

Wi-Fi scanning is used to find the AP MAC address, AP channel, and associated

Wi-Fi device MAC addresses. Prior to scanning, the Alfa card is plugged into the

workstation via USB. Figure 22 shows the commands used to prepare the Alfa card for

scanning (note that the specific wireless interface “wlan1” may vary on other devices):

(i) airmon-ng check kill − end any processes that may affect operation,

46

(ii) ifconfig wlan1 down − turn off the wireless interface,

(iii) iwconfig wlan1 mode monitor − set wireless interface to monitor mode,

(iv) ifconfig wlan1 down − turn on the wireless interface, and

(v) iwconfig − verify that the changes occurred successfully. The wireless

interface (in this case, wlan1) should be set to monitor mode, as indicated

by the red box.

Figure 23 shows the command used to scan for APs. This scan discovers the

following information: (1) the target AP’s MAC address, (2) AP channel, and (3) AP SSID.

The next step is scanning for all Wi-Fi devices associated with the target AP. Figure

24 shows the command used to accomplish this. The resulting list of MAC addresses is

compared against the list in Table 2 to ensure all Wi-Fi devices are detected.

Figure 22. Commands to prepare Alfa card for data collection

47

Figure 23. Commands used to scan for Wi-Fi AP

Wi-Fi sniffing is used to collect wireless traffic from the Wi-Fi devices. Prior to

scanning, ensure that the Alfa card is first set to monitor mode (see Figure 22. If monitor

mode is not set, execute all the commands in Figure 22). The airodump-ng tool from the

aircrack-ng suite is then used to capture raw Wi-Fi frames. To use airdump-ng, the wireless

interface (“wlan1”), output file format (“pcap”), target AP MAC address

(“78d2944dab3e”), and target AP channel (“9”) must be set. The command is

airodump-ng wlan1 –o pcap –w wifi –bssid 78d2944dab3e –c 9

48

Figure 24. Command used to scan for Wi-Fi devices associated to the AP

3.3.2.2 BLE Devices

BLE scanning is used to discover device names and device addresses from

advertising devices. Prior to scanning, the Plugable adapter is connected via USB. Figure

25 shows the commands used to scan for advertising BLE devices:

(i) service bluetooth start − start the Bluetooth service,

(ii) hciconfig hci1 up − open and initialize the Bluetooth device, and

(iii) hcitool lescan− scan for BLE devices.

49

Figure 25. Commands used to scan for BLE devices

BLE sniffing is used to collect wireless traffic from the BLE devices. Prior to

scanning, three Ubertooth One sniffers are connected to the workstation via USB. Each

Ubertooth One device (“U0” – “U2”) is set to sniff on one of three advertisement channels

(“A37”-“A39”), to follow connections (“f”), and to create a pcap output file (“q”). To

operate a single Ubertooth One, the command is

ubertooth-btle –f –U0 –A37 –qble.pcap

3.3.3 Data Preprocessing

Data preprocessing changes the collected raw data into a dataset suitable for

machine learning classifiers. Data preprocessing serves two objectives: feature extraction,

and data transformation. Feature extraction is used to build numerical and categorical

values (features) that represent information contained in the raw data. Data transformation

is used to format features into a representation more suitable for the machine learning

50

classifiers. Data preprocessing varies between Wi-Fi and BLE devices, and are discussed

separately. Scaling is not performed for any numerical feature.

3.3.3.1 Wi-Fi Preprocessing

Wi-Fi preprocessing is used to create a dataframe containing the device type

response class, associated packet count feature, packet length feature, packet subtype

features, vendor features, device name, set, source address, and packet time.

After data collection, the Wi-Fi pcap files are combined into one master pcap file,

then parsed using pyshark, a Python wrapper for Wireshark packet dissectors. The list of

device MAC addresses generated during data collection is then used to create a comma-

separated values (csv) file for each known Wi-Fi device. The MAC addresses are collected

to determine which devices are part of SHE, but are not used in the classification process.

Information about packet time, packet length, and data packet subtype are extracted from

each data packet. Time refers to the time the packet was transmitted and is measured in

epoch time. Time is not used as a feature and is used for organizational purposes. Packet

length is a numerical feature that refers to the size of the entire 802.11 packet and is

measured in bytes. Data packet subtype is a categorical feature corresponding to the type

of 802.11 data frame used. Frames of subtype 32 are data frames, and are the basic frame

type used in data transmission. Frames of subtype 40 and 44 are quality-of-service (QoS)

frames, which support latency-sensitive applications such as video and voice-over-IP [33].

QoS data frames (subtype 40) contain higher-protocol data, and function similarly to the

standard data frame type. QoS null frames (subtype 44) are frames that transmit no data,

but only frame information. QoS null frames are typically used by STAs to notify the AP

51

that the STA is entering a power-save mode. Source MAC addresses are extracted but are

not used as features in classification. These features are stored in the csv file of the source

device. All other 802.11 packet types and packets with a source address not in the list of

device MAC addresses are not used for classification and are not stored in the csv files.

Once all packets in the master pcap are parsed, the csv files are read into a dataframe

created using pandas, an open-source library that provides high-performance structures for

data analysis. Each row in the dataframe represents a single 802.11 packet, and each

column in the dataframe represents a feature. The device type response class is added by

mapping the source MAC address to a pre-built dictionary. Derived features are then

produced using existing features. The vendor feature is produced by mapping the source

MAC address to a vendor lookup application programming interface (API) from

macvendors.co that returns the vendor name of the wireless chip as registered in the IEEE

Standards Association [29][30]. Table 5 provides the complete list of Wi-Fi vendors. The

wireless chip used by a certain device is not necessarily tied to the vendor, therefore

differences between the device vendor and chip vendor are possible. For example, the Lifx

light bulbs use wireless chips produced by Lifi.

Table 5. Wi-Fi Vendor List

Vendors 1 Belkin 2 Dropcam 3 Lifi 4 TP-Link

52

The associated packets feature is then extracted. The associated packets feature is

a numerical feature that refers to the number of packets of a given device sent within one

second of each other, and is calculated using the transmission time feature. Categorical

features are one-hot encoded to allow for classification by algebraic classifiers (i.e., LDA).

One-hot encoding transforms a single categorical feature with 𝑘𝑘 categories into 𝑘𝑘 features

where binary values are used to represent inclusion in a given category. Figure 26 provides

an example using the data packet subtype feature. In the example, three packets each have

a different data packet subtype stored in the categorical feature DataSubtype. Through

one-hot encoding, the DataSubtype feature is transformed into three separate features, each

corresponding to the three data subtypes. For convenience during preprocessing, the

device names as listed in Table 2 are added as a dataframe variable to easily identify the

source of a given packet, and is not used for classification.

Figure 26. One-Hot Encoding

Packets are assigned to either the training set or test set, based on their source

device. Packets belonging to devices in the training set are used to build the classifiers,

and packets belonging to devices in the test set are used to evaluate the performance of the

classifiers. Table 7 provides the complete list of Wi-Fi device set assignments. The camera

53

device type has three devices in the training set, the light bulb device type has two, and the

plug device type has five devices. The camera and plug device types each have two devices

in the test set, while the light bulb device type has one. Device availability is the primary

reason for the dissimilarity in device count.

Table 6 summarizes the dataframe columns produced in Wi-Fi preprocessing. The

name of the dataframe column is provided, along with its use (categorical feature,

numerical feature, dataframe variable which is not used for classification but used for

organizational purposes only, or response class), definition, and unit or accepted values.

Table 6. Wi-Fi Dataframe Columns

Attribute Name Definition Value Type/ ML

Use Unit/Values

1 Associated Packet Count

Number of packets of a device sent within one

second Numerical Feature Packets per

second

2 Device Name Name given to device by user Information See Table 2

3 Device Type Category of device Response Class See Table 2 4 Packet Length Size of 802.11 packet Numerical Feature Bytes

5 Packet Subtype 802.11 data packet type Categorical Feature

[Data, QoS data, QoS null]

6 Set Assignment of device as training or test device Information See Table 7

7 Source Address MAC address of source device Information See Table 2

8 Time Time of packet transmission Information Epoch Time

9 Vendor Vendor of wireless chip Categorical Feature See Table 5

54

Table 7. Wi-Fi Device Set Assignment

Device Type Training Set Test Set

Camera Dropcam Kasa Netcam1 Netcam3 Netcam2

Light Bulb Lifx1 Lifx2 TpBulb

Plug

Insight TpPlug Switch1 Switch4 Switch2

Switch3 Mini

3.3.3.2 BLE Preprocessing

BLE preprocessing is used to create a dataframe containing the device type

response class, associated packet count feature, packet length feature, BLE link layer

header length feature, protocol data unit (PDU) type feature, radio frequency (RF) channel

number feature, device name, set, and packet time.

Similar to Wi-Fi preprocessing, the BLE pcap files are combined into one master

pcap file, and parsed using pyshark. The link layer device names and advertising

addresses are first extracted during parsing, but are not included as features. Instead, they

are used to identify which packets belong to the known devices in SHE. A csv file is then

created for each known BLE device. The following information is extracted from each

BLE packet belonging to a known device: time, length, RF channel, link layer (LL) packet

length, and PDU type. Time refers to the time the packet was transmitted, and is measured

in epoch time. Time is not used as a feature and is used for organizational purposes. Length

refers to the size of the entire BLE packet, and is measured in bytes. RF channel is a

55

categorical feature with values 0, 12, and 39, each corresponding to the radio frequency

channel from which the packet was sniffed. LL packet length refers to the length of the

BLE link layer header, and is measured in bytes. PDU type is a categorical feature with

values corresponding to each advertising PDU type (see Section 2.3.2). These features are

then stored in the csv file of the source device. All other BLE packet types and packets

from unknown devices are not used for classification and are not stored in the csv files.

Once all packets in the master pcap are parsed, the csv files are read into a pandas

dataframe, with each row in the dataframe representing a single BLE packet, and each

column in the dataframe representing a feature. The device type response class is added

by mapping either the LL device name or advertising address to a pre-built dictionary. The

associated packets feature is then derived using the same method described in Section

3.3.3.1. All categorical features are then one-hot encoded. For convenience during

preprocessing, the device names as listed in Table 3 are added in the dataframe to easily

identify the source of a given packet, and are not used for classification.

Packets are then assigned to either the training set or test set, based on their source

device. Packets belonging to devices in the training set are used to build the classifiers,

and packets belonging to devices in the test set are used to evaluate the performance of the

classifiers. Table 8 provides the complete list of BLE device set assignments. The door

sensor and temperature sensor device type each have three devices in the training set, while

the lock device types has two devices in the training set. All device types have one device

in the test set. Device availability is the primary reason door sensors have an extra device.

56

Table 8. BLE Device Set Assignments

Device Type Training Set Test Set

Door Sensor Home1 Door2 Home2

Door1

Temp Sensor Room1 Room2 Push Weather

Lock August1 August2 Kevo

Table 9 summarizes the dataframe columns produced in BLE preprocessing. The

name of the dataframe column is provided, along with its type (categorical feature,

numerical feature, dataframe variable which is used for organizational purposes only, or

response class), definition, and unit or accepted values.

57

Table 9. BLE Dataframe Columns

Attribute Name Definition Value Type/ ML Use Unit/Values

1 Associated Packet Count

Number of packets of a device sent within one

second

Numerical Feature

Packets per second

2 Device Name Name given to device by user Information See Table 3

3 Device Type Category of device Response Class See Table 3

4 Link Layer Header Length Length of BLE link layer Numerical

Feature Bytes

5 Packet Length Size of BLE packet Numerical Feature Byte

6 PDU Type Advertising PDU Type Categorical Feature See Table 1

7 RF Channel RF Channel on which packet was sent

Categorical Feature [0, 12, 39]

8 Set Assignment of device as training or test device Information See Table 8

9 Time Time of packet transmission Information Epoch Time

3.3.4 Model Tuning

Model tuning uses the preprocessed datasets to find the classifier hyperparameters

with the best performance during cross-validation. The optimal hyperparameters are then

used to create classifier models for model testing. The classification algorithms used are

implemented by scikit-learn version 0.20, an open-source machine learning package

written in Python [36].

Models for the BLE and Wi-Fi datasets are tuned separately, however the tuning

strategy for both protocols is identical. A range of possible hyperparameter values is

selected and evaluated using 10-fold cross-validation grid search, an exhaustive search that

evaluates each model using all hyperparameter values. The scoring metric used to evaluate

58

the models is the Matthews correlation coefficient (MCC) (see Section 4.4). The MCC

metric provides a measure of a classifier’s overall classification performance. The MCC

metric exists in the range [−1, 1], where −1 represents perfect misclassification and 1

represents perfect classification. The MCC metric is chosen over the traditionally used

accuracy metric because accuracy provides misleading information in imbalanced datasets,

such as this one. MCC not only accounts for class imbalances, but also provides a

convenient range of values to evaluate classifier performance.

Table 10 provides the complete list of the hyperparameters considered during

tuning. The KNN hyperparameter, n_neighbors, determines the number of neighbors

KNN considers in its classification of a given observation. The values used are the odd

numbers in the range 1 to 19. Larger values are originally tested but required significant

times to complete. The value range 1 to 19 provides a reasonable tuning range without

increasing computation costs significantly. Two RF hyperparameters are tuned:

max_features and n_estimators. The max_features hyperparameter determines the

size of random subsets of features RF considers when splitting a node [36]. The values

used are 2, 3, 5, 7, and 9 for Wi-Fi tuning and 2, 4, 7, 9, and 12 for BLE tuning. These

values are chosen because they are evenly-spaced integers that do not exceed the total

number of features used, which is 9 features for Wi-Fi tuning and 12 features for BLE

tuning. The n_estimators hyperparameter determines the number of decision tree the RF

model builds. The values used are 10, 15, 20, and 25 for both BLE and Wi-Fi tuning.

While other hyperparameters are available, the tuning process is limited to these

59

hyperparameters to allow for efficiency and speed, as increasing the number of

hyperparameters compounds the amount of time needed for tuning.

No hyperparameter tuning is necessary for the LDA classifier. However, unlike

the KNN and RF classifiers which are able to inherently handle imbalanced multi-

classification tasks, the LDA classifier requires additional information. Because of the

class imbalances in the test dataset, the class prior probabilities are required by the LDA

classifier. The class prior probabilities for the Wi-Fi dataset are 61.68%, 37.82%, and

0.51% for the plug, camera, and bulb device types respectively. The class prior

probabilities for the BLE dataset are 59.06%, 23.40%, and 17.54% for the door sensor,

temperature sensor, and lock device types respectively.

Table 10. Hyperparameters used in Grid Search

Hyperparameter Name Grid Values KNN N_neighbors [1 3 5 7 9 11 13 15 17]

RF (Wi-Fi) Max_features [ 2 3 5 7 9] RF (BLE) Max_features [ 2 4 7 9 12] RF (All) N_estimators [10 15 20 25]

Hyperparameter tuning is performed using a Jupyter notebook written in Python

(see Appendix B) [37]. For both Wi-Fi and BLE datasets, the script preprocesses the data

and tunes an untuned KNN and RF classifier using the hyperparameter values in Table 10.

When finished, the script reports the hyperparameter values that produced the best

performing classifiers. Table 11 provides the results of the cross-validation

hyperparameter tuning process. The hyperparameters that performed best on the Wi-Fi

dataset are presented in the left side of the table, while the hyperparameters for the BLE

60

dataset are presented in the right. Both Wi-Fi classifiers achieved an MCC of over 0.75,

showing that when tuned to these hyperparameters, the classifiers successfully attain a high

level of performance. As for the BLE classifiers, both the KNN and RF classifiers also

reach a high level of performance on the MCC metric. The hyperparameter values used

show no signs of improvement past the range of values used, and so it is concluded that

these hyperparameter values are used for experimental testing.

Table 11. Best-Performing Hyperparameters

Wi-Fi BLE Hyperparameter Values MCC Hyperparameter

Values MCC

KNN N_neighbors = 11 0.961 N_neighbors = 5 0.960

RF Max_features = 2 0.975 Max_features = 7 0.961 N_estimators = 20 N_estimators = 20

3.4 Data Exploration

Data exploration is performed to gain an initial understanding of the original dataset

before any classification is performed. Data exploration can provide valuable insights in

classification. The datasets used in data exploration are the original preprocessed datasets.

All graphs in this section are organized similarly. The x-axis is the device type, divided

into separate bins. Bin range values are chosen to highlight significant properties in the

data. The y-axis, expressed in powers of 10, is the count of packets that belong to each

corresponding bin.

61

3.4.1 BLE Data Exploration

The BLE dataset is comprised of six features: packet length, BLE link layer header

length, associated packet count, radio frequency (RF) channel number, and protocol data

unit (PDU) type.

Figure 27 shows the packet length feature, measured in bytes, across the BLE

device types. The figure is organized into four bins: 20 to 40 bytes, 40 to 60 bytes, 60 to

80 bytes, and 80 to 100 bytes. One immediate insight can be observed: if a BLE packet

length is greater than 80 bytes, it must be a lock device because it is the only device with

packet lengths greater than 80 bytes. No other significant insights are easily discernable.

Figure 27. BLE Packet Length

Figure 28 shows the BLE link layer header length feature, measured in bytes, across

the BLE device types. The figure is organized into three bins: 3 bytes to 100 bytes, 100

bytes to 150 bytes, and 150 bytes to 250 bytes. BLE link layer header length appears to be

equally distributed across device types. No significant insights are easily discernable.

62

Figure 28. BLE Link Layer Header Length

Figure 29 shows the associated packet count feature, measured in packets per

second, across the BLE device types, and is organized into three bins: 0 to 20, 20 to 40,

and 40 to 60. BLE associated packet count appears to have some observable patterns. If a

packet has an associated packet count of over 40 packets/sec, it must be a door sensor

device. Furthermore, associated packet counts of 20 to 40 packets/sec are heavily

associated with lock devices.

63

Figure 29. BLE Associated Packet Count

Figure 30 shows the RF channel number feature across the BLE device types, and

is composed of three categories: channel 0, channel 12, and channel 39. The BLE RF

channel feature appears to be equally distributed across device types. No significant

insights are easily discernable.

Figure 30. BLE RF Channels

64

Figure 31 shows the PDU type feature across the BLE device types and is composed

of seven categories (see section 2.3.2). BLE PDU type appears to have one significant

pattern. Three PDU types, scan requests (SCAN_REQ), advertising direct indications

(ADV_DIRECT_IND), and connection requests (CONNECT_REQ) are only used by door

devices. Temperature sensor and lock devices share PDU types and are similar in

distribution.

To summarize BLE data exploration, the packet length, associated packet count,

and PDU type features are observed to have clear classification value. If the packet length

is greater than 80 bytes, the packet belongs to a lock device. If the associated packet count

is greater than 40, the packet belongs to a door sensor device. If a packet uses one of three

certain PDU types, the packet belongs to a door sensor. The BLE link layer header length

and RF channel number features appear to be similarly distributed across the three device

types.

Figure 31. BLE PDU Types

65

3.4.2 Wi-Fi Data Exploration

The Wi-Fi dataset is comprised of four features: packet length, vendor, associated

packet count, and packet subtype.

Figure 32 shows the packet length feature, measured in bytes, across the Wi-Fi

device types. The figure is organized into three bins: 25 to 100 bytes, 100 to 500 bytes,

and 500 to 1550 bytes. It can be observed that only camera devices have packets over 500

bytes. Additionally, plug and camera devices tends to use smaller packets, while bulb

devices tend to use larger packets.

Figure 33 shows the vendor feature across the Wi-Fi device types, and is composed

of six categories (see Table 5). The Wi-Fi vendor feature appears to have observable traits

that can be used in classification. TP-link is the only vendor that produces all three device

types. Belkin produces both plug and camera devices, but not bulbs. Lifi and Dropcam

both only produce one device type each. The vendor feature may prove to be a powerful

feature that can be leveraged by the classifiers.

66

Figure 32. Wi-Fi Packet Length

Figure 33. Wi-Fi Vendors

Figure 34 shows the associated packet count length feature, measured in packets

per second, across the Wi-Fi device types, and is organized into three bins: 0 to 30

packets/sec, 30 to 100 packets/sec, and 100 to 200 packets/sec. Wi-Fi associated packet

count appears to have some observable patterns. Only bulb devices have packets with an

67

associated packet count over 100 packets/sec. Furthermore, between plug and camera

devices, only camera devices have associated packet count of 30 to 100 packets/sec.

Similar to the vendor feature, the Wi-Fi associated packet count appears to be a feature

with classifying potential.

Figure 35 shows the packet subtype feature across the Wi-Fi device types, and is

composed of three categories: data, quality-of-service (QoS) data, and QoS null. One clear

observation can be made: bulbs do not use QoS null packets. Plug and camera devices

appear to share similar packet subtype distributions.

Figure 34. Wi-Fi Associated Packet Count

68

Figure 35. Wi-Fi Packet Subtype

To summarize Wi-Fi data exploration, all features are observed to have

classification value. If the packet length is greater than 500 bytes, the packet belongs to a

camera device. If the vendor is Belkin, the packet belongs to either a plug or camera device.

If the vendor is Lifi, the packet belongs to a bulb device. If the vendor is Dropcam, the

packet belongs to a camera device. If the associated packet count is greater than 100, the

packet belongs to a bulb device. Lastly, if the packet is of the QoS null subtype, the packet

does not belong to a bulb device.

3.5 Design Summary

This chapter describes each component of SHE and DCP. Their design enables the

use of authentic smart device wireless traffic in the classification of IoT devices. This

chapter also explores the features of the datasets, looking for patterns that may be useful in

gaining an understanding of the classification task.

69

IV. Methodology

4.1 Problem/Objective

The goal of this experiment is to evaluate the effectiveness of machine learning

classifiers at identifying IoT devices using wirelessly collected traffic. The experiment

discussed in this section evaluates these classifiers using a set of performance metrics. The

experiment attempts to complete three objectives:

1. Determine the ability of a classifier to classify a given packet to a device type.

2. Measure the performance of a classifier in identifying the device types of a smart

home environment.

3. Determine which features are most useful for classification.

4.2 System under Test

The system under test (SUT) diagram is shown in Figure 36. The system under test

are the tuned classifiers. Within the SUT are the three components under test: the KNN,

LDA and RF classifiers. The response variables and metrics used to evaluate the

classifiers’ performance include the confusion matrix, feature importance score, Matthews

correlation coefficient, mean precision, and mean recall, and are described in Sections 4.3

and 4.4. Controlled variables are discussed in Section 4.5. The collected BLE and Wi-Fi

traffic are considered uncontrolled and are analyzed in Section 4.6. The parameters are

variables that remain unchanged throughout the experiment and are studied in Section 4.7.

The experimental factors are the variables that change between experimental trials, and are

discussed in Section 4.8.

70

Figure 36. System under test diagram

4.2.1 Assumptions

The following assumptions are made throughout the design and execution of the

experiments of the SUT:

1. The activities done by the devices in the smart home characterize a real-life

smart home environment.

2. Each device in the test setup is unique, and the sniffer is not misrepresenting

any collected data.

3. The sniffer has necessary network knowledge to collect wireless traffic, to

include access point MAC address, access point channel, BLE device MAC

addresses, and BLE device names.

4. Devices in a test setup are not interfering with each other in any significant

manner.

5. Outside noise is negligible.

6. Devices do not apply additional security mechanisms.

4.3 Response Variables

Response variables are the direct outputs of the experiment, and are used to

calculate the metrics. The objectives of the experiment motivate the response variables

selected to assess the performance of the classifiers. The confusion matrix response

71

variables are produced by each combination of device type and algorithm, for a total of

nine configurations per wireless protocol. The feature importance score is produced by the

random forest classifier, and is reported once per trial. All response variables are

numerical. Table 12 provides a summary of each response variable.

• Objective 1: Determine the ability of each classifier to classify a given packet to a

device type. The variables listed below are collectively referred to as the confusion

matrix response variables because they are derived from the confusion matrix.

o True Positives (TP): The TP response variable measures the number of true

positives, or packets that are correctly classified to a device type.

o False Positives (FP): The FP response variable measures the number of

false positives, or packets that are incorrectly classified as a device type.

o False Negatives (FN): The FN response variable measures the number of

false negatives, or packets that are incorrectly not classified as a device type.

The TP, FP, and FN variables are presented using the confusion matrix. The

confusion matrix is a square matrix with an equal number of row and columns, where a

row of the matrix represents the instances in an actual class, and a column represents the

instance in a predicted class. Figure 37 shows a 𝑘𝑘 𝑥𝑥 𝑘𝑘 confusion matrix, where 𝑘𝑘 is the

number of classes (in this case, three), 𝑁𝑁 is the total number of instances, and 𝑐𝑐𝑖𝑖𝑖𝑖 is the

number of instances with a true label of 𝑖𝑖 classified into class 𝑗𝑗. The total predicted count

for class 𝑥𝑥 is given by 𝑐𝑐∙𝑥𝑥, while the total actual count for class 𝑥𝑥 is given by summing the

values along the column 𝑐𝑐𝑥𝑥∙. The TP count for class 𝑥𝑥, 𝑇𝑇𝑃𝑃𝑥𝑥, is given by the value at 𝑐𝑐𝑥𝑥𝑥𝑥. It

can be observed that all class TP counts are found along the diagonal of the confusion

72

matrix. The FP count for class 𝑥𝑥, 𝐹𝐹𝑃𝑃𝑥𝑥, is calculated by 𝑐𝑐∙𝑥𝑥 − 𝑇𝑇𝑃𝑃𝑥𝑥. The FN count for class

𝑥𝑥, 𝐹𝐹𝑁𝑁𝑥𝑥, is calculated by 𝑐𝑐𝑥𝑥∙ − 𝑇𝑇𝑃𝑃𝑥𝑥.

Figure 37. A 𝒌𝒌 𝒙𝒙 𝒌𝒌 Confusion Matrix (𝒌𝒌 = 𝟑𝟑)

• Objective 2: Measure the importance of features in classification using the random

forests classifier.

o Feature Importance Score: The feature importance score measures how

significant a feature is in the classification model. The random forest

classifier, as implemented by scikit-learn, reports the feature importance

with values ranging from 0 to 1, where higher values correspond to higher

feature importance. The sum of feature importance scores is 1.

73

Table 12. Response Variables

Name Source/Formula Definition

TP Count for a class found at the diagonal of the confusion matrix Correctly classified packet

FP �𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶 𝑃𝑃𝑃𝑃𝑃𝑃𝑑𝑑𝑖𝑖𝑐𝑐𝑃𝑃𝑃𝑃𝑑𝑑 − 𝑇𝑇𝑃𝑃 Packets that were incorrectly classified as a particular class

FN �𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶 𝐴𝐴𝑐𝑐𝑃𝑃𝐴𝐴𝐶𝐶𝐶𝐶 − 𝑇𝑇𝑃𝑃 Packets that were incorrectly not classified as a particular class

Feature Importance Derived from random forests classifiers Measure of a given feature’s

usefulness in classification

4.4 Performance Metrics

The performance of the classifiers is measured using three metrics: the Matthews

correlation coefficient, mean recall, and mean precision. These performance metrics are

calculated using functions from the scikit-learn metrics module.

4.4.1 Matthews Correlation Coefficient (MCC)

The MCC metric provides a measure of a classifier’s overall classification

performance using the confusion matrix. Traditionally used as a binary classification

metric, the MCC has been successfully extended to multi-classification tasks [38] [39].

Functionally, the MCC metric is chosen over the popularly used accuracy metric because

accuracy misrepresents classifier performance in imbalanced datasets. If instances from

the majority class significantly outnumber minority classes, a classifier can potentially

report high accuracy scores simply by selecting the majority class for all instances. MCC

does not suffer from this type of misrepresentation. MCC not only accounts for class

imbalances, but also provides a convenient range of values to evaluate classifier

performance. The MCC metric exists in the range [−1, 1], where −1 represents perfect

misclassification and 1 represents perfect classification. A MCC value of 0 is calculated

74

for confusion matrices that performed random classification. The MCC metric is

calculated in this research using the sklearn.metrics matthews_corrcoef function.

The MCC can be calculated formally using

𝑀𝑀𝐶𝐶𝐶𝐶 = 𝑁𝑁 𝑇𝑇𝑇𝑇(𝐶𝐶)− ∑ 𝐶𝐶𝑘𝑘𝐶𝐶𝑙𝑙𝑘𝑘𝑙𝑙

�𝑁𝑁2− ∑ 𝐶𝐶𝑘𝑘(𝐶𝐶𝑇𝑇)𝑙𝑙𝑘𝑘𝑙𝑙 �𝑁𝑁2− ∑ (𝐶𝐶𝑇𝑇)𝑘𝑘𝐶𝐶𝑙𝑙𝑘𝑘𝑙𝑙 (2)

where 𝑁𝑁 is the total number of instances in the confusion matrix 𝐶𝐶, 𝑇𝑇𝑃𝑃(𝐶𝐶) is the trace or

sum of the confusion matrix diagonal, 𝐶𝐶𝑘𝑘 is the 𝑘𝑘th row of 𝐶𝐶, 𝐶𝐶𝑙𝑙 is the 𝐶𝐶th column of 𝐶𝐶,

and 𝐶𝐶𝑇𝑇 is 𝐶𝐶 transposed [39].

4.4.2 Mean Precision

The mean precision metric provides a measure of the overall positive predictive

power of a classifier by calculating the mean precision metric over all classes. Precision

is calculated using

𝑃𝑃𝑃𝑃𝑃𝑃𝑐𝑐𝑖𝑖𝐶𝐶𝑖𝑖𝑃𝑃𝑃𝑃 = 𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇+𝐹𝐹𝑇𝑇

× 100 (3)

where TP represents the true positive count and FP represents the false positive count. A

classifier with high precision provides high confidence that a positive prediction is a correct

prediction. The mean precision metric is calculated in this research using the

sklearn.metrics precision_score function with macro-averaging, while individual

class precision metrics are calculated using the precision_score function without

averaging.

4.4.3 Mean Recall

The mean recall metric provides a measure of the overall success of a classifier as

tested on an imbalanced dataset by calculating the mean recall over all classes. The recall

75

metric measures the rate at which a classifier correctly identifies a given device type.

Recall is calculated using

𝑅𝑅𝑃𝑃𝑐𝑐𝐶𝐶𝐶𝐶𝐶𝐶 = 𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇+𝐹𝐹𝑁𝑁

× 100 (4)

where TP represents the true positive count and FN represents the false negative count. A

classifier with high recall rarely overlooks an actual positive, giving high confidence that

the classifier can identify a complete set of actual positives. Mean recall is also known as

balanced accuracy, and recall is also known as sensitivity or true positive rate. The mean

recall metric is calculated in this research using the sklearn.metrics

balanced_accuracy_score function, while individual class recall metrics are calculated

using the recall_score function.

4.4.4 High Performance

Classifiers are evaluated on their performance using the MCC, mean recall, and

mean precision metrics. Table 13 provides a summary of the performance metrics,

including the units, accepted range, and performance threshold values of each metric.

Performance threshold values specify the values necessary for a metric to be considered

high or low performance, and are derived from hypothesized classifier performance. A

classifier that performs worse than random chance is considered a low performance

classifier. As such, the low performance threshold values reflect those expected from a

random chance classifier. The chosen high performance threshold values indicate

performance that is significantly better than random chance, and would reflect a classifier

with high classification performance.

76

Table 13. Performance Metrics

Metric Units Range High Performance Threshold

Low Performance Threshold

MCC -- -1 to 1 ≥ 0.50 < 0.0 Mean Recall % 0 to 100 ≥ 75% < 50%

Mean Precision % 0 to 100 ≥ 75% < 50%

4.5 Control Variables

The following variables are controlled in each trial to provide the classifiers

sufficient information.

• Classifier Hyperparameters: Each classifier has certain hyperparameters that may

be adjusted to improve performance. A range of tuning hyperparameters are

evaluated by DCP to produce the best-performing classifiers. Tuned classifiers are

used for the multiclass full-featured classification task, but not used for all other

trials.

• Source of Motion: Various devices in the smart home environment rely on motion

detection to initiate actions. A controlled source of motion is needed to consistently

initiate device actions. An Arduino Uno microcontroller is programmed to activate

a stepper motor every fifteen minutes. The stepper motor rotates an eight-inch rod

with a two-inch-wide paper flap attached to it. The entire apparatus resembles a

one-armed windmill, and its appearance and location in SHE is shown in Figure 38.

4.6 Uncontrolled Variables

The experiment assumes a realistic smart home environment. Therefore, the

collected BLE and Wi-Fi traffic cannot be controlled. The use of commercial devices and

77

an open wireless environment presents potential issues with outside noise and

accompanying unintended effects. Nevertheless, no significant noise is experienced

throughout data collection and experimentation.

4.7 Parameters

The experiment is performed under a number of parameters to replicate a realistic

smart home environment. Minimizing external factors is critical in achieving this. The

parameters in this experiment include:

• Location of Devices: Each device is placed in the same location in the testing

environment at the time of sniffing.

• Location of Sniffers: Each sniffer is placed in the same location in the testing

environment at the time of sniffing.

• Number of Devices: The number of devices does not change.

• Type of Devices: The types of devices present in each test setup does not change.

• Sniffing Equipment: The same sniffing equipment is used to collect wireless

packets. This includes antennas, wireless dongles, and software tools. This limits

the amount of instrumentation noise that could be introduced by the equipment.

• Computing Environment: The computing environment that performs the

classification is kept constant. This includes operating system, system resources,

programming languages, and hardware.

78

Figure 38. Motion Source Appearance (left) and Location in SHE (right)

79

4.8 Factors

The experiment factors are the changes done between experimental trials. The

experiment uses two factors in evaluating the classifiers’ performance: classification task

and selected features.

The classification task factor refers to the number of response classes (i.e., device

types) being classified. Two classification tasks are used in the experiment: multiclass and

binary. Multiclass classification is performed to classify all given device types in the

wireless protocol, while binary classification only classifies between two device types.

Multiclass classification is the primary focus of the experiment as it provides the most

realistic classification value. Binary classification is completed for classification analysis

purposes, especially when classifiers are having difficulties distinguishing between two

given device types.

The selected features factor refers to the number of features employed in the

classification process (see Table 6 and Table 9 for the full list of Wi-Fi and BLE features,

respectively). Two feature configurations are used in this experiment: full-featured and

best-features. Full-featured multiclass classification uses all available features for the

given protocol to classify devices into their respective device types. The tuned models

with optimal hyperparameters are used in this task because the tuning process applies all

features. The best-features multiclass classification uses only the k best-performing

features from full-featured multiclass classification, where k is an integer no greater than

the total of all available features. The use of a small subset of relevant features frequently

results in improved performance because of the removal of noisy and redundant features

80

that confuse the classifiers [40]. In this experiment, k is set to 3 in order to standardize the

evaluation of both BLE and Wi-Fi classifiers.

4.9 Experimental Design

The experiment proceeds in three stages: data collection, model tuning, and model

testing. The complete list of steps needed to perform the experiment is provided in

Appendix C. SHE devices are set up and allowed to reach a steady state for one day. Data

collection for the BLE and Wi-Fi devices then occurs over a period of three days. Wireless

packet sniffing is performed for eight hours for each data collection day. Once data

collection is complete, two Jupyter notebooks running DCP, one each for BLE and Wi-Fi,

are used to clean and process data, extract features, and perform model tuning. Once model

tuning is complete, the classifiers are updated to use the best-performing hyperparameter

values. The test dataset is adjusted using down-sampling to maintain a uniform distribution

of packets across all device types. Random down-sampling selects data points at random

and removes them from the dataset, ensuring that all device types have an equal number of

packets in the test dataset.

Model testing is done using a Jupyter notebook written in Python, one each for BLE

and Wi-Fi (see Appendix D and E). The models are evaluated on two classification tasks:

full-featured multiclass classification and best-features classification. The full set of

features is used to evaluate the overall classification performance of the machine learning

classifiers, while best-features classification is used to check if performance can be

improved by removing irrelevant features. Using the random forests feature importance

scores, the three most important features are selected and used to retrain and retest the

81

KNN, RF, and LDA classifiers. By reducing the number of features, feature confusion can

be minimized. Response variables and performance metrics are extracted, and are stored

in csv files. Evaluating the performance of the classifiers on the test set is the primary

focus of Chapter 5.

4.10 Methodology Summary

This chapter provides the experimentation methodology used to evaluate the

performance of the classifiers through the confusion matrix and feature importance

response variables, and the Matthews correlation coefficient, mean recall, and mean

precision metrics.

82

V. Results and Analysis

5.1 Overview

This chapter provides the results obtained from the experimentation described in

Chapter 4. Classifier performance is evaluated using MCC, mean recall, and mean

precision performance metrics. When needed, further analysis is completed using the

confusion matrix results and feature importance scores. Classifiers are evaluated on their

performance in the full-featured multiclass classification task where all available features

are used in classification, and then in the best-features multiclass classification task where

only the most relevant features are used, as determined by feature importance scores.

Lastly, classification analysis is completed to discuss notable behavior by the classifiers.

Sections 5.2 and 5.3 examine the classifiers’ performance in the BLE and Wi-Fi datasets

respectively.

5.2 BLE Classifier Performance

The classification task for BLE classifiers is categorizing a BLE packets into one

of three device types: door sensor, lock, or temperature sensor. Results are calculated using

the Jupyter notebook in Appendix D.

5.2.1 BLE Full-featured Classification

This section examines the performance of the classifiers when the full set of

available features are used, otherwise known as full-featured classification. Figure 39 and

Table 14 provide the confusion matrices and overall performance metrics of the BLE

classifiers for full-featured classification. The MCC metric provides a measure of a

classifier’s overall classification performance using the confusion matrix, where -1

83

indicates perfect misclassification and 1 indicates perfect classification. The KNN

classifier achieves the best overall performance with an MCC of 0.55, a mean precision of

54%, and a mean recall of 64%. The RF classifier attains an MCC of 0.00, indicating that

its performance is comparable to random chance, while the LDA classifier achieves a

negative MCC value, revealing its poor value as a classifier. Out of all classifiers, only the

KNN classifier’s MCC metric succeeds in exceeding the high performance threshold.

Table 14. Classifier Performance in BLE Full-Featured Classification

MCC Mean Precision Mean Recall KNN 0.55 54.1% 64.0% RF 0.00 17.1% 33.5%

LDA -0.61 0.0% 0.0%

The precision metric measures the positive predictive power of the classifiers. A

classifier with high precision provides high confidence that a positive prediction is a correct

prediction. Table 15 provides the precision scores for all BLE device types on the full-

featured classification task. While KNN achieves a perfect precision score on lock devices,

it performs poorly on the door sensor and temperature sensor devices. This mixed

performance explains KNN’s low mean precision score. As expected from their low mean

precision scores, the RF and LDA classifiers achieve poor individual precision scores, with

the LDA classifier managing to score 0.0% precision for all device types.

84

Figure 39. Confusion Matrices from BLE Full-Featured Classification

85

Table 15. Device Type Precision in BLE Full-Featured Classification

Door Sensor Lock Temp Sensor KNN 12.3% 100.0% 50.0% RF 1.1% 0.0% 50.1%

LDA 0.0% 0.0% 0.0% Mean 4.5% 33.3% 33.4%

The recall performance metric measures the completeness of the classifiers. A

classifier with high recall provides high confidence that a complete set of actual positives

is found. Table 16 provides the recall scores for all BLE device types on the full-featured

classification task. KNN again achieves the best performance, with excellent scores on

two device types. Given their recall scores, high confidence can be placed that KNN can

successfully identify the majority of lock and temperature sensor devices. However,

KNN’s performance is severely diminished by the poor recall score of 1.3% on the door

devices. The RF classifier manages an outstanding 99.3% recall score on the temperature

sensor devices, but scored poorly on the other two device types. The LDA classifier

continues its low performance with a 0.0% recall score for all device types.

Table 16. Classifier Recall in BLE Full-Featured Classification

Door Sensor Lock Temp Sensor KNN 1.3% 92.0% 98.8% RF 1.1% 0.0% 99.3%

LDA 0.0% 0.0% 0.0% Mean 0.8% 30.7% 66.0%

The feature importance score measures the significance of a feature in

classification. Feature importance scores range from 0 to 1.0, where higher values indicate

86

more important features. Feature importance is reported from the random forest classifier.

It is important to note that the random forests classifier is only performing at chance, with

an MCC value of 0.00. Therefore it cannot be assumed that the reported feature importance

scores are meaningful. Nevertheless, their scores are reported here for completeness.

Table 17 provides the feature importance scores obtained from full-featured classification.

Out of the twelve available features, seven report nonzero feature importance scores.

Packet length, BLE link layer header length, and associated packet count are observed as

the three most important features, and combined account for over two-thirds of the feature

importance. This indicates that a majority of the random forest classifier’s decision-

making rely on these three features. Five features report a feature importance score of

0.000, signifying that the random forest classifier does not benefit from these features. It

is interesting to note that the top three features are numerical features and the features with

scores of 0.000 are categorical features. A possible explanation is that because categorical

features are one-hot encoded, the value of their information is spread across multiple

individual features. For example, the PDU type feature is one-hot encoded to six binary

features. Therefore, the feature importance of the PDU type feature is divided into six

separate feature importance scores. As such, numerical features tend to report high

importance scores because categorical scores are spread out.

87

Table 17. Feature Importance in BLE Full-Featured Classification

Feature Score 1 Packet Length 0.327 2 BLE LL Length 0.212 3 Associated Packet Count 0.193 4 SCAN_RSP PDU Type 0.135 5 ADV_IND PDU Type 0.106 6 SCAN_REQ PDU Type 0.026 7 Channel 39 0.001 8 ADV_DIRECT_IND PDU Type 0.000 9 ADV_NONCONN_IND PDU Type 0.000 10 CONNECT_REQ PDU Type 0.000 11 Channel 0 0.000 12 Channel 12 0.000

5.2.2 BLE Best-Features Classification

Best-features classification uses a small subset of features from the original feature

set. The use of a small subset of relevant features frequently results in improved

performance because of the removal of noisy and redundant features that confuse the

classifiers [40]. Best-features classification is performed after full-featured classification

to take advantage of the feature importance scores obtained from full-featured

classification.

The three best features reported by BLE full-featured classification are the packet

length, BLE link layer header length, and associated packet count features (see Table 17).

The classifiers are retrained and retested using a dataset containing only these three features

to execute best-features classification. Table 18 provides the overall performance metrics

of the BLE classifiers in best-features classification. As hypothesized, the performance of

the classifiers improves in best-features classification. The KNN classifier experiences

88

minor improvement across all metrics, with a 0.02 points added to MCC, 3.7% added to

mean precision, and 1.2% added to mean recall. The RF classifier and LDA classifier both

experienced significant improvements, with the RF classifier adding 0.57 points to its

MCC, 36.2% added to mean precision, and 31.6% added to mean recall, and the LDA

classifier adding 0.61 points to its MCC, 17.3% points to its mean precision, and 33.4%

points to its mean recall.

Table 18. Classifier Performance in BLE Best-3 Feature Classification


LDA 0.00 17.3% 33.4%

5.2.3 BLE Classification Analysis

Classification analysis is performed to examine notable observations and offer

explanations that caused them. The key classification observation in the BLE dataset is

the misclassification of door sensors as temperature sensors. Figure 39 provides the

confusion matrix results from BLE full-featured classification. As described in Section

4.3, the x-axis shows the predicted labels and the y-axis shows the true labels. Cells with

darker colors indicate a higher count of instances. It is observed that both KNN and RF

classifiers heavily misclassify door sensors as temperature sensors. This misclassification

directly contributes to the classifiers’ poor recall performances. To understand this

misclassification, a classification trial is prepared with only door sensors and temperature

sensors. The classification task is adjusted to a binary classification task between the two

device types. Table 19 and Figure 40 provide the respective overall performance metrics

89

and confusion matrix for this binary classification. The overall performance metrics

confirm that the classifiers cannot reliably distinguish between door sensors and

temperature sensors. All classifiers achieve MCC scores close to zero, indicating that their

value as classifiers resembles that of random guessing. Similarly, the mean recall scores

are approximately 50%, signifying that the classifiers are only able to consistently discern

a complete set of device types half the time. The confusion matrix results imply that

because the classifiers are not capable of finding a meaningful difference between the

device types, the classifiers are reduced to a naïve strategy of categorizing the vast majority

of instances into a single device type. Considering the top three features, it becomes clear

that there are no clear differences between door sensors and temperature sensors. Door

sensors and temperature sensors have nearly identical distributions for the packet length,

BLE LL length, and associated packet count features, explaining why the classifiers

experienced difficulty in separating the two device types (see Appendix E).

Table 19. Classifier Performance in BLE Door Sensors vs Temperature Sensors

MCC Mean Precision Mean Recall KNN -0.01 25.0% 50.0% RF 0.02 55.2% 50.2%

LDA 0.03 59.4% 50.2%

90

Figure 40. Confusion Matrices from BLE Door Sensors vs Temperature Sensors

91

5.3 Wi-Fi Classifier Performance

The classification task for Wi-Fi classifiers is categorizing a given Wi-Fi packet

into one of three device types: bulb, camera, or smart plug. Results are calculated using

the Jupyter notebook in Appendix E.

5.3.1 Wi-Fi Full-featured Classification

Figure 41 and Table 20 provide the confusion matrices and the overall performance

metrics of the Wi-Fi classifiers in full-featured classification. Out of the three classifiers,

two achieve MCC values above the high-performance threshold, indicating they have

noteworthy classification value. The KNN classifier attains the best overall performance

with all metrics exceeding the high-performance threshold as mentioned in Section 4.4.4.

The LDA classifier succeeds with a high-performance MCC, but fails to achieve high

performance in its mean precision and mean recall scores. The RF classifier fails to achieve

any high performance metrics, but nevertheless manages respectable performance levels.

Table 20. Classifier Performance in Wi-Fi Full-Featured Classification


LDA 0.84 89.7% 89.2%

92

Figure 41. Confusion Matrices from Wi-Fi Full-Featured Classification

93

Table 21 and Table 22 provide the precision and recall scores for all Wi-Fi device

types respectively. Out of the three device types, the KNN and RF classifiers achieve high

precision scores on two device types, while the LDA classifier achieves high precision

scores on all three device types. Comparing between device types, the classifiers achieves

the highest mean precision on the bulb devices, suggesting the classifiers are able to

positively predict the bulb devices at a higher rate than the camera and plug devices. The

recall scores confirm this idea, as all three classifiers achieve high recall scores on the bulb

device type. Recall scores for all three classifiers on the plug device type also meet the

high performance threshold, implying that Wi-Fi classifiers are able to successfully

identify two out of the three device types.

Table 21. Device Type Precision in Wi-Fi Full-Featured Classification

Bulb Camera Plug KNN 100.0% 74.9% 68.3% RF 67.0% 95.9% 93.4%

LDA 100.0% 79.3% 89.8% Mean 89.0% 83.4% 83.8%

Table 22. Device Type Recall in Wi-Fi Full-Featured Classification

Bulb Camera Plug KNN 96.7% 63.1% 81.3% RF 100.0% 64.0% 78.3%

LDA 100.0% 91.4% 76.1% Mean 98.9% 72.8% 78.6%

Table 23 provides the feature importance scores obtained from Wi-Fi full-featured

classification. Out of the ten available features, eight report nonzero feature importance

94

scores. The Belkin vendor, associated packet count, and Dropcam vendor features are

observed as the three most important features, and together account for over 90% of the

feature importance. This indicates that a majority of the random forest classifier’s

decision-making depended on these three features. One feature (Tp-Link) reports a feature

importance score of 0.000, signifying that the random forest classifier does not benefit from

this feature.

Table 23. Feature Importance in Wi-Fi Full-Featured Classification

Feature Score 1 Belkin Vendor 0.531 2 Associated Packet Count 0.283 3 Dropcam Vendor 0.103 4 Packet Length 0.035 5 QoS_Null Packet Subtype 0.021 6 QoS_Data Packet Subtype 0.015 7 Data Packet Subtype 0.008 8 Lifi Vendor 0.005 9 Tp-link Vendor 0.000

5.3.2 Wi-Fi Best-features Classification

Similar to BLE classification, best-features classification is performed after full-

featured classification to benefit from the feature importance scores from full-featured

classification. It is hypothesized that the use of a reduced subset of relevant features results

in improved classification performance.

The three best features reported by Wi-Fi full-featured classification are the

Dropcam vendor, Belkin vendor, and associated packet count features (see Table 23). The

classifiers are retrained and retested using a dataset containing only these three features to

95

execute best-features classification. Table 24 provides the overall performance metrics of

the Wi-Fi classifiers in best-features classification. Best-features classification results in

significant decline of performance for all classifiers. The KNN classifier experiences a

decline in performance across all metrics, with a 0.38 point reduction in MCC, 38.3%

subtracted from mean precision, and 27.8% subtracted from mean recall. The RF classifier

also receives substantial decreases across all metrics, with a 0.24 point reduction in MCC,

37.0% subtracted from mean precision, and 20.9% subtracted from mean recall. Lastly,

the LDA classifier receives significant losses in performance, with a 0.28 point reduction

in MCC, 11.0% subtracted from mean precision, and 21.4% subtracted from mean recall.

Table 24. Classifier Performance in Wi-Fi Best-3 Feature Classification


LDA 0.56 78.7% 67.8%

5.3.3 Wi-Fi Classification Analysis

Wi-Fi classification analysis is performed to understand two notable observations:

the misclassification of camera devices as plug devices and the reliance of the classifiers

on vendor features.

The first notable observation is the misclassification of cameras as plug devices.

Figure 41 provides the confusion matrix results from Wi-Fi full-featured classification. It

is observed that all three classifiers misclassify cameras as plugs to varying degrees. To

understand this misclassification, a classification trial is prepared with only camera and

plug device types in the training and test sets. The classification task is adjusted to a binary

96

classification between the two device types. Table 25 and Figure 42 provide the respective

overall performance metrics and confusion matrix for the binary classification of cameras

and plugs. The overall performance metrics report a loss of performance across all

classifiers. Notably, the KNN classifier sees a 0.79 drop in its MCC score. By contrast,

the RF classifier’s performance experiences a lesser yet still significant decline, with the

MCC dropping by 0.22 points. The confusion matrix results suggest that the KNN and

LDA classifiers classify the majority of packets as plug devices, while the RF classifier

successfully separates the two device types.

The random forest feature importance scores are then analyzed to understand which

features the RF classifier uses to achieve this. Table 26 provides the feature importance

scores in the cameras versus plugs classification. The best features are observed as the

Belkin vendor, associated packet count, and Dropcam vendor features. Interestingly, these

are the same best features found by full-featured classification, except their importance

order are switched around. Looking into the feature distributions between cameras and

plugs, it can be observed that while both device types have devices manufactured by

Belkin, there are more Belkin plug devices than there are Belkin camera devices (see

Appendix F). Additionally, the camera device type has more instances of associated packet

counts 1 to 4. At this point, it is hypothesized that using the full set of available features

adds noise to the classification, and that using only the best features would improve the

performance of the KNN classifier, maintain the performance of the RF classifier, and

diminish the performance of the LDA classifier.

97

Table 25. Classifier Performance in Wi-Fi Cameras vs Plugs (Full-Featured Classification)

MCC Mean Precision Mean Recall KNN -0.08 45.1% 46.8% RF 0.51 75.6% 75.5%

LDA 0.13 57.0% 55.9%

Table 26. Feature Importance in Wi-Fi Cameras vs Plugs (Full-Featured

Classification)

Feature Score 1 Belkin Vendor 0.455 2 Dropcam Vendor 0.273 3 Associated Packet Count 0.192 4 QoS_Null Packet Subtype 0.027 5 Packet Length 0.025 6 Data Packet Subtype 0.024 7 QoS_Null Packet Subtype 0.012 8 Tp-Link Vendor 0.000 9 Lifi Vendor 0.000

The classification task is adjusted to use only the features with the best importance

scores: Belkin vendor, Dropcam vendor, and associated packet count. Table 27 provides

the overall performance of the classifiers on the binary classification with best features.

The classifier performances change as predicted. The KNN classifier’s performance

metrics are restored to decent values, with the MCC returning to a positive value, and the

mean precision and mean recall scores returning to above 50%. The RF classifier maintains

its previous performance, while the LDA classifier experiences a slight drop in

performance.

98

Table 27. Classifier Performance in Wi-Fi Cameras vs Plugs (Best Features

Classification)


LDA 0.05 61.5% 50.5%

The second notable observation made from Wi-Fi classification is the reliance of

the full-featured classification on certain vendor features (e.g., Dropcam, Belkin) but

complete independence from the other vendor features (e.g., Lifi, Tp-Link). A

classification trial was performed to analyze how classifier performance is affected if no

vendor features are used in the classification. It was hypothesized that classifier

performance would decline because of the absence of all vendor features, and that the

associated packet count feature would be an important feature. Table 28 and Figure 43

provides the respective performance metrics and confusion matrix of the Wi-Fi

classification with no vendor features. As hypothesized, all classifiers experienced a

decline in performance, as compared to the full-featured classification. However, the

degree to which the performance declined was unexpected. The KNN and LDA classifiers

suffered significant reductions in performance, with the KNN classifier experiencing a

38% decrease in MCC, 31% decrease in mean precision, and 25% decrease in mean recall,

and the LDA classifier experiencing an 82% decrease in MCC, 32% decrease in mean

precision, and 44% decrease in mean recall. The RF classifier experienced a minor drop

in performance, with a 10% decrease in MCC, 13% decrease in mean precision, and 4%

decrease in mean recall.

99

Table 28. Classifier Performance in Wi-Fi Classification (No Vendor Features)


LDA 0.10 48.4% 39.6%

The random forest feature importance scores were analyzed to understand which

features the RF classifier found most significant. Table 29 provides the feature importance

scores in the cameras vs. plugs classification. The associated packets count is observed as

by far the most important feature, garnering over 90% of the feature importance. This

result suggests that the vendor features provide significant information, the classifiers are

able to extract sufficient information from the associated packets count feature to achieve

meaningful results.

Table 29. Feature Importance in Wi-Fi Classification (No Vendor Features)

Feature Score 1 Associated Packet Count 0.918 2 Packet Length 0.046 3 QoS_Null Packet Subtype 0.014 4 Data Packet Subtype 0.013 5 QoS_Data Packet Subtype 0.009

100

Figure 42. Confusion Matrices from Wi-Fi Cameras vs Plugs

Figure 43. Confusion Matrices from Wi-Fi Classification (No Vendor Features)

101

5.4 Results Summary

This section reviews the results of classification trials and analyzes the classifier

performances on the BLE and Wi-Fi device type classification tasks. Table 30 and Table

31 summarize the classifiers performance on the classification tasks, showing which

classifiers met the criteria for high performance and low performance. Out of the BLE

classifiers, only the KNN classifier managed to achieve a high performance in both the

full-featured and best-features classification tasks, getting excellent MCC scores in both.

Out of the Wi-Fi classifiers, KNN succeeded in achieved high performance across all

metrics for the full-featured and no vendor features classification tasks. The LDA classifier

attained two high performance metrics in the best-feature classification task.

Table 30. High and Low Performance BLE Classifiers

102

Table 31. High and Low Performance Wi-Fi Classifiers

103

VI. Conclusion

6.1 Overview

This chapter provides a summary of the research and results found during

experimentation. Section 6.2 reviews the conclusions taken from the experiment and

results, while Section 6.3 offers a review of the research’s significance. Finally, Section

6.4 presents opportunities for future work in this research area.

6.2 Research Conclusions

The research goals that guide this thesis are successfully met through five

contributions:

1. Design and build a source of realistic smart home device traffic:

designing and building SHE to produce real-life smart home wireless traffic

2. Develop procedures to collect and prepare the wireless traffic for

machine learning classification: developing DCP to collect and prepare

wireless data for machine learning

3. Evaluate the performance of the linear discriminant analysis (LDA), k-

nearest neighbors (KNN), and random forests (RF) machine learning

classification algorithms in determining IoT device types: implementing

and evaluating LDA, KNN, and RF classifier performances using

experimental trials

4. Determine which features are most useful for classification purposes:

reporting feature importance scores used by RF classifiers in the experiment

104

5. Assess the suitability of machine learning towards the task of IoT

device type classification: discussed below, with an assessment of the

machine learning approach used in this research towards the task of IoT

device type classification.

The hypothesis presented in this research is if machine learning classifiers are

trained using wireless traffic from a realistic smart home environment, then the classifiers

can successfully identify the device type of IoT devices to a high degree of performance.

This research provides mixed results towards answering this hypothesis. A smart home

environment was successfully created and used towards training machine learning

classifiers. However, the classifiers achieved moderate levels of performance on the BLE

dataset and high levels of performances on the Wi-Fi dataset. On average, the classifiers

were able to identify BLE device types with an MCC of -0.02, a mean precision of 23.7%,

and a mean recall of 32.5%, and Wi-Fi device types with an MCC of 0.76, a mean precision

of 85.4%, and a mean recall of 83.4%. Therefore, when viewed as a whole, the research

results provide moderate support for the hypothesis. However, individual classifiers

managed to achieve higher levels of success. In the BLE dataset, the KNN classifier

achieved an MCC of 0.55, a mean precision of 54.1%, and a mean recall of 64%. In the

Wi-Fi dataset, the LDA classifier achieved an MCC of 0.84, a mean precision of 89.7%,

and a mean recall of 83.4%. While not shared by the RF and LDA classifiers, these

individual moments of high performance suggest that machine learning can indeed be

applied toward the task of IoT device classification, and therefore provides support for the

research hypothesis.

105

6.3 Research Significance

The research completed in this thesis offers relevant insights for machine learning

and its applications in IoT cybersecurity. This research presents the first, and at the time

of this work, the only application of machine learning towards the task of IoT device type

classification. This research uses three different classification methods: linear

transformation (LDA), decision trees (RF), and non-parametric methods (KNN). The data

exploration revealed certain wireless traffic patterns that may guide new research attempts

in this area. Lastly, while the research ultimately yielded mixed results, the methodology

applied a straightforward approach that serves as a necessary stepping-stone for future

efforts.

6.4 Future Work

There are several opportunities in extending this research area as there is currently

a lack of research in the intersection of IoT device security and machine learning

classification. Five future work possibilities are offered below:

1. Development of a sequential pattern-of-life tracking tool: Smart device

usage may indicate a subject’s location within an area, providing

information on the subject’s pattern-of-life. By applying the techniques

used in this research, it may be possible to develop a classifier that can

sequentially track a subject’s actions and movements and create a log of the

subject’s activities.

2. Expansion of devices in the smart home environment. The number,

device type, wireless protocol, model, and manufacturer of devices in the

106

smart home environment can be expanded to produce a training set that

includes a more robust selection of IoT devices currently in use in the

market today. The inclusion of more devices with a single manufacturer

but with varying device types is a particularly interesting idea as only a

limited number of these devices were used in this research.

3. Scalability to multiple smart home environments. Attempts to deploy

this research in a scalable matter could involve expanding the smart home

environment to several environments. The classifiers were trained using a

dataset limited to a single home environment with an individual user. The

dataset can be expanded to include multiple home environments with

multiple users. Doing so would introduce a larger variety of smart device

usage patterns that may reduce bias in the dataset due to only having a single

user. Care must be taken in first developing a big data system that can

handle the substantial volume of wireless traffic data produced by multiple

smart home environments.

4. Feature extraction. Features can be obtained across different levels in the

data. BLE and Wi-Fi header information from the individual packet level

is the primary source of features used by DCP. Features from the flow level

and connection level can be extracted and derived. Applying a more

complex approach that factors device interactions with time-based features

may produce more promising results.

107

5. Classification algorithms. More classification algorithms and techniques

can be applied. A myriad of sophisticated techniques, including support

vector machines and deep learning, exist that may yield better classification

performances.

108

Appendix A. Device Details

Name Brand Model Model Number Serial Number Device Type Protocol Device Setup MAC Address

August1 August Smart Lock ASL3B L4FWQ02EL4 Lock BLE Training

August2 August Smart Lock ASL01 L1GHX005D6 Lock BLE Test

Door1 Eve Door & Window 2ED309901000 CU49F1A03655 Door Sensor BLE Training

Door2 Eve Door & Window 20EAL9901 DV13H1A00054 Door Sensor BLE Test

Dropcam Dropcam WiFi Video Monitoring

DROPCAM3HDB 308CFB3A1AAD Camera WiFi Training 308CFB3A1AAD

Home1 BLE Home Door Sensor 1444BE Door Sensor BLE Training

Home2 BLE Home Door Sensor 1444A1 Door Sensor BLE Training

Insight Belkin Wemo Insight Switch F7C029V2 231618K12013ED Plug WiFi Training 14918224DD35

Kasa TPLink Kasa Cam KC120 2184339000783 Camera WiFi Test AC84C6977CCC

Kevo Kwikset Kevo 925GED1500MK2 3022AMK2 Lock BLE Training

Lifx1 Lifx Lightbulb LHA19E26UC10 D073D526B84C Light Bulb WiFi Training D073D526B84C

Lifx2 Lifx Lightbulb LHA19E26UC10 D073D526C927 Light Bulb WiFi Test D073D526C927

Mini Belkin Wemo Mini F7C063 221708K0100DEA Plug WiFi Test 6038E0EE7CE5

109

Netcam1 Belkin NetCam HD+ F7D7602V2 35418VB2200526 Camera WiFi Training EC1A59E4FD41

Netcam2 Belkin NetCam HD+ F7D7602V2 35418VB2200320 Camera WiFi Training EC1A59E4FA09

Netcam3 Belkin NetCam HD+ F7D7602V2 35418VB2200833 Camera WiFi Test EC1A59E5020D

Push SensorPush Smart Sensor HT1 2AL9XHT1 Temp Sensor BLE Training

Room1 Eve Room 2ER309901000 BU45F1A03216 Temp Sensor BLE Training

Room2 Eve Room 2ER309901000 BU35E1A02542 Temp Sensor BLE Test

Yosemite Netgear Nighthawk X4S AC2600 R7800 4H4E855K01D4E Router WiFi NA 78D2944DAB3F

Switch1 Belkin Wemo Switch F7C027 221621K01027F9 Plug WiFi Training 149182CDDF3D

Switch2 Belkin Wemo Switch F7C027 221343K010034E Plug WiFi Training B4750E0D9465

Switch3 Belkin Wemo Switch F7C027 221342K0101C51 Plug WiFi Test B4750E0D33D5

Switch4 Belkin Wemo Switch F7C027 221417K01007F1 Plug WiFi Test 94103E2B7A55

TpBulb TPLink Smart WiFi LED Bulb LB100E26 217C581015895 Light Bulb WiFi Training B04E26C52A41

TpPlug TPLink Smart WiFi Plug HS100 2179815005849 Plug WiFi Test 704F57F9E1B8

110

Weather Eve Weather 2EW309901000 AU40F1A04650 Temp Sensor BLE Training

111

Appendix B. Hyperparameter Tuning Script

1. # coding: utf-8 2. 3. # In[12]: 4. 5. 6. # from Pipeline import BLEPipeline, WifiPipeline 7. from MulticlassDCP import BLEMulticlassDCP, WifiMulticlassDCP 8. 9. # General data processing 10. import numpy as np 11. import pandas as pd 12. 13. # Plotting 14. import matplotlib.pyplot as plt 15. import seaborn as sns 16. import scikitplot as skplt 17. 18. # ML libraries 19. from sklearn.discriminant_analysis import LinearDiscriminantAnalysis 20. from sklearn.ensemble import RandomForestClassifier 21. from sklearn.neighbors import KNeighborsClassifier 22. from sklearn.metrics import balanced_accuracy_score, precision_score, recall_sco

re, roc_auc_score, confusion_matrix 23. from sklearn.model_selection import GridSearchCV, KFold, cross_val_score 24. 25. # System libraries 26. import itertools 27. import random, time 28. 29. # Warning filtering 30. import warnings 31. warnings.filterwarnings("ignore", category=FutureWarning) 32. warnings.filterwarnings("ignore", category=UserWarning) 33. warnings.filterwarnings("ignore", category=DeprecationWarning) 34. plt.rcParams.update({'figure.max_open_warning': 0}) 35. 36. 37. # # Wi-Fi Tuning 38. 39. # In[13]: 40. 41. 42. start_time = time.time() 43. 44. 45. # In[14]: 46. 47. 48. w = WifiMulticlassDCP() 49. 50. 51. # In[15]: 52. 53.

112

54. df = w.make_dataframe() 55. 56. # Take out packets from router 57. df = df[df["DeviceType"]!="router"] 58. 59. 60. # In[16]: 61. 62. 63. # Divide training and test sets 64. df_train = df[df['Set']=='train'] 65. df_test = df[df['Set']=='test'] 66. 67. 68. # In[17]: 69. 70. 71. # Wifi: Define which features to use 72. features_list = [ 73. # Packet info 74. "PacketLength", 75. 76. # Vendor 77. "Belkin", "Dropcam", "Lifi", "Tp-link", 78. 79. # 802.11 Data subtype 80. "Data", "QoS_Data", "QoS_Null", 81. 82. # Associated Packets 83. "Assoc_Packets"] 84. 85. # Define what the response classes are 86. y_list = ["bulb", "camera", "plug"] 87. 88. 89. # In[18]: 90. 91. 92. # Define grid values 93. knn_param_grid = dict(n_neighbors=np.arange(1,19,2)) 94. rf_param_grid = dict(max_features=np.linspace(2, len(features_list), num=5, dtyp

e=int)) 95. lda_param_grid = dict(n_components=np.arange(1,5)) 96. 97. # Time wifi gridsearch 98. wifi_start = time.time() 99. 100. # Run gridsearch 101. w_knn = w.tune_gridsearch(KNeighborsClassifier(), knn_param_grid, df_tra

in, 102. features_list, y_list) 103. w_rf = w.tune_gridsearch(RandomForestClassifier(), rf_param_grid, df_tra

in, 104. features_list, y_list) 105. w_lda = w.tune_gridsearch(LinearDiscriminantAnalysis(priors=[0.61678342,

0.37815795, 0.00505862]), lda_param_grid, df_train, 106. features_list, y_list) 107.

113

108. wifi_end = time.time() - wifi_start 109. 110. 111. # In[19]: 112. 113. 114. print wifi_end, "sec" 115. 116. 117. # In[20]: 118. 119. 120. print w_knn['grid_result'].best_score_, w_knn['grid_result'].best_params

_ 121. print w_lda['grid_result'].best_score_, w_lda['grid_result'].best_params

_ 122. print w_rf['grid_result'].best_score_, w_rf['grid_result'].best_params_

123. 124. 125. # In[21]: 126. 127. 128. w.plot_all_vcs([w_knn, w_lda, w_rf]) 129. 130. 131. # # BLE Tuning 132. 133. # In[22]: 134. 135. 136. b = BLEMulticlassDCP() 137. 138. 139. # In[23]: 140. 141. 142. bdf = b.make_dataframe() 143. 144. 145. # In[24]: 146. 147. 148. # Divide training and test sets 149. bdf_train = bdf[bdf['Set']=='train'] 150. bdf_test = bdf[bdf['Set']=='test'] 151. 152. 153. # In[25]: 154. 155. 156. # BLE: Define which features to use 157. features_list = [ 158. # Packet info 159. "PacketLength", "BLE_LL_Length", 160. 161. # Associate Packets 162. "Assoc_Packets",

114

163. 164. # Channel number 165. "Channel_0", "Channel_12", "Channel_39", 166. 167. # PDU Type 168. "SCAN_RSP", "ADV_IND", "SCAN_REQ", 169. "CONNECT_REQ", "ADV_NONCONN_IND", "ADV_DIRECT_IND"] 170. 171. y_list = ["door", "lock", "temp"] 172. 173. 174. # In[26]: 175. 176. 177. # Define grid values 178. knn_param_grid = dict(n_neighbors=np.arange(1,19,2)) 179. rf_param_grid = dict(max_features=np.linspace(2, len(features_list), num

=5, dtype=int)) 180. lda_param_grid = dict(n_components=np.arange(1,5)) 181. 182. # Time BLE gridsearch 183. ble_start = time.time() 184. 185. # Run gridsearch 186. b_knn = b.tune_gridsearch(KNeighborsClassifier(), knn_param_grid, bdf_tr

ain, 187. features_list, y_list) 188. b_rf = b.tune_gridsearch(RandomForestClassifier(), rf_param_grid, bdf_tr

ain, 189. features_list, y_list) 190. b_lda = b.tune_gridsearch(LinearDiscriminantAnalysis(priors=[0.59063441,

0.23399223, 0.17537336]), lda_param_grid, bdf_train, 191. features_list, y_list) 192. 193. ble_end = ble_start - time.time() 194. 195. 196. # In[27]: 197. 198. 199. print ble_end 200. 201. 202. # In[28]: 203. 204. 205. print b_knn['grid_result'].best_score_, b_knn['grid_result'].best_params

_ 206. print b_lda['grid_result'].best_score_, b_lda['grid_result'].best_params

_ 207. print b_rf['grid_result'].best_score_, b_rf['grid_result'].best_params_

208. 209. 210. # In[29]: 211. 212. 213. b.plot_all_vcs([b_knn, b_lda, b_rf])

115

214. 215. 216. # In[30]: 217. 218. 219. end_time = time.time() - start_time 220. total_gridsearch_time = ble_end + wifi_end 221. print total_gridsearch_time, "sec" 222. print end_time, "sec"

116

Appendix C. Experimental Procedure

1. Set up devices in designated locations as described in Section 3.2.5. Configure

device actions as described on Table 4.

2. Set up motion source. Ensure Arduino microcontroller is powered on and

operational.

3. Let devices reach steady state over the course of one day.

4. Perform scans using Plugable Bluetooth adapter and Alfa card. Use the commands

as described in Figure 22 and 15 for Wi-Fi scanning, and Figure 25 for BLE

scanning.

5. Perform sniffing using Ubertooth One sniffers and Alfa card. Use the commands

as described in Section 3.3.2. Let run for 8 hours. Perform for a total of three days.

6. Perform model tuning by executing the Jupyter notebook in Appendix B. Update

MulticlassDCP.py with the best-performing hyperparameter values. Find class

prior probability values and update LDA classifier in MulticlassDCP.py with these

values.

7. Perform data preprocessing and model testing. Use the Jupyter notebooks in

Appendix D and Appendix E for BLE testing and Wi-Fi testing, respectively.

117

Appendix D. Multiclass BLE Classification

1. # coding: utf-8 2. 3. # In[1]: 4. 5. 6. from MulticlassDCP import BLEMulticlassDCP 7. 8. # General data processing 9. import numpy as np 10. import pandas as pd 11. 12. # Plotting 13. import matplotlib.pyplot as plt 14. import seaborn as sns 15. 16. # ML libraries 17. from sklearn.discriminant_analysis import LinearDiscriminantAnalysis 18. from sklearn.ensemble import RandomForestClassifier 19. from sklearn.neighbors import KNeighborsClassifier 20. from sklearn.metrics import balanced_accuracy_score, precision_score, recall_sco

re, confusion_matrix, matthews_corrcoef 21. from sklearn.model_selection import GridSearchCV, cross_val_score 22. from imblearn.under_sampling import RandomUnderSampler 23. 24. # System libraries 25. import itertools 26. import random, time 27. 28. # Warning filtering 29. import warnings 30. warnings.filterwarnings("ignore", category=FutureWarning) 31. warnings.filterwarnings("ignore", category=UserWarning) 32. warnings.filterwarnings("ignore", category=DeprecationWarning) 33. plt.rcParams.update({'figure.max_open_warning': 0}) 34. 35. 36. # # Create, process dataframe 37. 38. # In[2]: 39. 40. 41. start_time = time.time() 42. 43. 44. # In[3]: 45. 46. 47. b = BLEMulticlassDCP() 48. 49. 50. # In[4]: 51. 52. 53. df = b.make_dataframe()

118

54. 55. 56. # # Prep dataset 57. 58. # In[5]: 59. 60. 61. # BLE: Define which features to use 62. features_list = [ 63. # Packet info 64. "PacketLength", "BLE_LL_Length", 65. 66. # Associate Packets 67. "Assoc_Packets", 68. 69. # Channel number 70. "Channel_0", "Channel_12", "Channel_39", 71. 72. # PDU Type 73. "SCAN_RSP", "ADV_IND", "SCAN_REQ", 74. "CONNECT_REQ", "ADV_NONCONN_IND", "ADV_DIRECT_IND"] 75. 76. y_list = ["door", "lock", "temp"] 77. 78. 79. # In[6]: 80. 81. 82. # Prep training set 83. df_train = df[df['Set']=='train'] 84. print df_train['DeviceType'].value_counts() 85. 86. 87. # In[7]: 88. 89. 90. df_test = df[df['Set']=='test'] 91. 92. # Show initial test set imbalance 93. print "Initial test set distribution:" 94. print df_test['DeviceType'].value_counts() 95. df_test['DeviceType'].value_counts().sort_index().plot(kind='bar', title="Test P

acket Counts Before Resampling",logy=True); 96. 97. 98. # In[8]: 99. 100. 101. # Downsample test set so that there is equal chance that the classifier

will choose any given class 102. rds = RandomUnderSampler(random_state=42) 103. test_X_downsampled, test_y_downsampled = rds.fit_resample(df_test[featur

es_list], df_test['DeviceType']) 104. 105. # Show class counts after downsampling 106. unique, counts = np.unique(test_y_downsampled, return_counts=True) 107. print np.asarray((unique, counts)).T 108.

119

109. 110. # In[9]: 111. 112. 113. # Recreate df_test 114. df_test_downsampled = pd.DataFrame(test_X_downsampled,columns=features_l

ist) 115. df_test_downsampled['DeviceType'] = test_y_downsampled 116. 117. 118. # # Run multiclass on all features 119. 120. # In[10]: 121. 122. 123. multiclass_start = time.time() 124. 125. preds, metrics, cms, feature_importance = b.run_multiclass(df_train, df_

test_downsampled, features_list, y_list) 126. 127. multiclass_end = time.time() - multiclass_start 128. 129. 130. # ## Report results 131. 132. # ### Report confusion matrices 133. 134. # In[11]: 135. 136. 137. b.plot_all_confusion_matrices(cms, y_list) 138. 139. 140. # ### Report metrics 141. 142. # In[12]: 143. 144. 145. metrics_df = b.report_metrics(metrics, y_list, 'ble-

multiclass_metrics') 146. display(metrics_df) 147. 148. 149. # ### Report feature importance 150. 151. # In[13]: 152. 153. 154. f_i = b.report_featureimportance(feature_importance, features_list) 155. display(f_i) 156. 157. 158. # # Residuals Analysis 159. 160. # ## Use *k* top features only 161. 162. # ### Find *k* where *k* is the count of features that yields best BACC

120

163. 164. # In[14]: 165. 166. 167. fs_start = time.time() 168. 169. # Find best features using KBest scheme 170. feature_selection = [] 171. for i in range(0,len(f_i)): 172. top_features = list(f_i.index[0:i+1]) 173. 174. tf_preds, tf_metrics, tf_cms, tf_feature_importance = b.run_multicla

ss(df_train, df_test_downsampled, top_features, y_list,use_tuned=False) 175. tf_metrics_df = b.report_metrics(tf_metrics, y_list, to_csv=False) 176. 177. ave = np.average(tf_metrics_df['Mean_Recall']) 178. feature_selection.append(ave) 179. 180. fs_end = time.time() - fs_start 181. 182. 183. # In[15]: 184. 185. 186. k = feature_selection.index(max(feature_selection)) 187. print 'Best Mean Recall',max(feature_selection),":", k+1, "features" 188. 189. 190. # In[16]: 191. 192. 193. feature_selection 194. 195. 196. # ### Run multiclass with top *k* features (*k* = 3) 197. 198. # In[17]: 199. 200. 201. # Run multiclass with top 3 features 202. tf_preds, tf_metrics, tf_cms, tf_feature_importance = b.run_multiclass(d

f_train, df_test_downsampled, list(f_i.index[0:k+1]), y_list,use_tuned=False) 203. 204. 205. # In[18]: 206. 207. 208. tf_metrics_df = b.report_metrics(tf_metrics, y_list, 'ble-

topfeatures_metrics') 209. display(tf_metrics_df) 210. 211. 212. # ### Plot confusion matrices 213. 214. # In[19]: 215. 216. 217. b.plot_all_confusion_matrices(tf_cms, y_list)

121

218. 219. 220. # ### Report feature importance of TF3 221. 222. # In[20]: 223. 224. 225. b.report_featureimportance(tf_feature_importance, f_i.index[0:k+1]) 226. 227. 228. # ## Error analysis 229. 230. # **Error 1**: The main error across all classifiers is the misclassific

ation of door devices as temp devices. 231. 232. # In[21]: 233. 234. 235. # Get door and temp packets 236. df_train_doortemp = df_train[(df_train['DeviceType']=='door') | (df_trai

n['DeviceType']=='temp')] 237. 238. df_test_doortemp = df_test_downsampled[(df_test_downsampled['DeviceType'

]=='door') | (df_test_downsampled['DeviceType']=='temp')] 239. 240. 241. # In[22]: 242. 243. 244. # Run multiclass on just the two device types 245. doortemp_preds, doortemp_metrics, doortemp_cms, doortemp_feature_importa

nce = b.run_multiclass(df_train_doortemp, df_test_doortemp, features_list, ['door','temp'],use_tuned=False)

246. 247. 248. # In[23]: 249. 250. 251. doortemp_metrics_df = b.report_metrics(doortemp_metrics, ['door','temp']

, 'ble-doortemp_metrics') 252. display(doortemp_metrics_df) 253. 254. 255. # In[24]: 256. 257. 258. b.plot_all_confusion_matrices(doortemp_cms, ['door','temp']) 259. 260. 261. # It appears that with just the two classes, the classifiers cannot dist

inguish between the two devices. The next step is to look at the feature selection

262. 263. # In[25]: 264. 265. 266. b.report_featureimportance(doortemp_feature_importance, features_list) 267.

122

268. 269. # In[26]: 270. 271. 272. # Run door vs temp with top 3 features 273. dt3_preds, dt3_metrics, dt3_cms, dt3_feature_importance = b.run_multicla

ss(df_train_doortemp, df_test_doortemp, features_list[0:3], ['door','temp'],use_tuned=False)

274. 275. 276. # In[27]: 277. 278. 279. b.report_metrics(dt3_metrics, ['door','temp'], 'ble-dt3_metrics') 280. 281. 282. # No difference with top 3 features. 283. 284. # In[28]: 285. 286. 287. f, axes = plt.subplots(3, 1, figsize=(6, 14)) 288. sns.countplot(x='DeviceType', hue='PacketLength',ax=axes[0], data=df_tes

t_doortemp); 289. sns.countplot(x='DeviceType', hue='BLE_LL_Length',ax=axes[1], data=df_te

st_doortemp); 290. sns.countplot(x='DeviceType', hue='Assoc_Packets',ax=axes[2], data=df_te

st_doortemp); 291. 292. 293. # # Report times 294. 295. # In[29]: 296. 297. 298. print multiclass_end 299. print fs_end 300. end_time = time.time() - start_time 301. print end_time

123

Appendix E. Multiclass Wi-Fi Classification

1. # coding: utf-8 2. 3. # In[1]: 4. 5. 6. # from Pipeline import WifiPipeline 7. from MulticlassDCP import WifiMulticlassDCP 8. 9. # General data processing 10. import numpy as np 11. import pandas as pd 12. 13. # Plotting 14. import matplotlib.pyplot as plt 15. import seaborn as sns 16. 17. # ML libraries 18. from sklearn.discriminant_analysis import LinearDiscriminantAnalysis 19. from sklearn.ensemble import RandomForestClassifier 20. from sklearn.neighbors import KNeighborsClassifier 21. from sklearn.metrics import balanced_accuracy_score, precision_score, recall_sco

re, confusion_matrix, matthews_corrcoef 22. from sklearn.model_selection import GridSearchCV, cross_val_score 23. from imblearn.under_sampling import RandomUnderSampler 24. 25. # System libraries 26. import itertools 27. import random, time 28. 29. # Warning filtering 30. import warnings 31. warnings.filterwarnings("ignore", category=FutureWarning) 32. warnings.filterwarnings("ignore", category=UserWarning) 33. warnings.filterwarnings("ignore", category=DeprecationWarning) 34. plt.rcParams.update({'figure.max_open_warning': 0}) 35. 36. 37. # # Create, process dataframe 38. 39. # In[2]: 40. 41. 42. start_time = time.time() 43. 44. 45. # In[3]: 46. 47. 48. w = WifiMulticlassDCP() 49. 50. 51. # In[4]: 52. 53.

124

54. df = w.make_dataframe() 55. 56. # Take out packets from router 57. df = df[df["DeviceType"]!="router"] 58. print len(df) 59. 60. 61. # # Prep dataset 62. 63. # In[5]: 64. 65. 66. # Wifi: Define which features to use 67. features_list = [ 68. # Packet info 69. "PacketLength", 70. 71. # Vendor 72. "Belkin", "Dropcam", "Lifi", "Tp-link", 73. 74. # 802.11 Data subtype 75. "Data", "QoS_Data", "QoS_Null", 76. 77. # Associated Packets 78. "Assoc_Packets"] 79. 80. # Define what the response classes are 81. y_list = ["bulb", "camera", "plug"] 82. 83. 84. # In[6]: 85. 86. 87. # Prep training set 88. df_train = df[df['Set']=='train'] 89. print df_train['DeviceType'].value_counts() 90. 91. 92. # In[7]: 93. 94. 95. df_test = df[df['Set']=='test'] 96. 97. # Show initial test set imbalance 98. print "Initial test set distribution:" 99. print df_test['DeviceType'].value_counts() 100. df_test['DeviceType'].value_counts().sort_index().plot(kind='bar', title

="Test Packet Counts Before Resampling",logy=True); 101. 102. 103. # In[8]: 104. 105. 106. # Downsample test set so that there is equal chance that the classifier

will choose any given class 107. rds = RandomUnderSampler(random_state=42) 108. test_X_downsampled, test_y_downsampled = rds.fit_resample(df_test[featur

es_list], df_test['DeviceType'])

125

109. 110. # Show class counts after downsampling 111. unique, counts = np.unique(test_y_downsampled, return_counts=True) 112. print np.asarray((unique, counts)).T 113. 114. 115. # In[9]: 116. 117. 118. # Recreate df_test 119. df_test_downsampled = pd.DataFrame(test_X_downsampled,columns=features_l

ist) 120. df_test_downsampled['DeviceType'] = test_y_downsampled 121. 122. 123. # # Run multiclass 124. 125. # In[10]: 126. 127. 128. multiclass_start = time.time() 129. 130. preds, metrics, cms, feature_importance = w.run_multiclass(df_train, df_

test_downsampled, features_list, y_list) 131. 132. multiclass_end = time.time() - multiclass_start 133. 134. 135. # # Report results 136. 137. # ## Report confusion matrices 138. 139. # In[11]: 140. 141. 142. w.plot_all_confusion_matrices(cms, y_list) 143. plt.savefig('Results/CM/wifi-cm-full.png') 144. 145. 146. # ## Report metrics 147. 148. # In[12]: 149. 150. 151. metrics_df = w.report_metrics(metrics, y_list, 'wifi-

multiclass_metrics') 152. display(metrics_df) 153. 154. 155. # ## Report feature importance 156. 157. # In[13]: 158. 159. 160. f_i = w.report_featureimportance(feature_importance, features_list) 161. display(f_i) 162. 163.

126

164. # # Residuals Analysis 165. 166. # ## Use only top 3 features 167. 168. # ### Run multiclass with top 3 features 169. 170. # In[14]: 171. 172. 173. fs_start = time.time() 174. # Run multiclass with top 3 features 175. tf3_preds, tf3_metrics, tf3_cms, tf3_feature_importance = w.run_multicla

ss(df_train, df_test_downsampled, list(f_i.index[0:3]), y_list, use_tuned=False, use_priors=True)

176. 177. fs_end = time.time() - fs_start 178. 179. 180. # In[15]: 181. 182. 183. tf3_metrics_df = w.report_metrics(tf3_metrics, y_list, 'wifi-

tf3_metrics') 184. display(tf3_metrics_df) 185. 186. 187. # ### Plot confusion matrices 188. 189. # In[16]: 190. 191. 192. w.plot_all_confusion_matrices(tf3_cms, y_list) 193. plt.savefig('Results/CM/wifi-cm-best3.png') 194. 195. 196. # ### Report feature importance of 3 best 197. 198. # In[17]: 199. 200. 201. w.report_featureimportance(tf3_feature_importance, f_i.index[0:3]) 202. 203. 204. # ## Remove vendor features 205. 206. # In[18]: 207. 208. 209. # Remove vendors features 210. nv_features = ['PacketLength', 'Data', 'QoS_Data', 'QoS_Null', 'Assoc_Pa

ckets'] 211. 212. 213. # In[19]: 214. 215. 216. nv_start = time.time() 217.

127

218. # Run multiclass without vendors 219. nv_preds, nv_metrics, nv_cms, nv_feature_importance = w.run_multiclass(d

f_train, df_test_downsampled, nv_features, y_list, use_tuned=False) 220. 221. nv_end = time.time() - nv_start 222. 223. 224. # In[20]: 225. 226. 227. nv_metrics_df = w.report_metrics(nv_metrics, y_list, 'wifi-

novendor_metrics') 228. display(nv_metrics_df) 229. 230. 231. # In[21]: 232. 233. 234. w.plot_all_confusion_matrices(nv_cms, y_list) 235. plt.savefig('Results/CM/wifi-cm-novendor.png') 236. 237. 238. # In[22]: 239. 240. 241. w.report_featureimportance(nv_feature_importance, nv_features) 242. 243. 244. # ## Error analysis 245. 246. # In[23]: 247. 248. 249. def output_decisionpath(model, features_list, class_names, filename): 250. # Source: https://towardsdatascience.com/how-to-visualize-a-

decision-tree-from-a-random-forest-in-python-using-scikit-learn-38ad2d75f21c 251. 252. # Extract single tree 253. estimator = model.estimators_[5] 254. 255. from sklearn.tree import export_graphviz 256. # Export as dot file 257. export_graphviz(estimator, out_file='tree.dot', 258. feature_names = features_list, 259. class_names = class_names, 260. rounded = True, proportion = False, 261. precision = 2, filled = True) 262. 263. # Convert to png using system command (requires Graphviz) 264. from subprocess import call 265. call(['dot', '-Tpng', 'tree.dot', '-

o', 'Results/'+filename+'.png', '-Gdpi=600']) 266. 267. 268. # **Error 1**: KNN and RF confuse camera and plugs 269. 270. # In[24]: 271.

128

272. 273. # Get camera and plug packets 274. df_train_camplugs = df_train[(df_train['DeviceType']=='camera') | (df_tr

ain['DeviceType']=='plug')] 275. 276. df_test_camplugs = df_test_downsampled[(df_test_downsampled['DeviceType'

]=='camera') | (df_test_downsampled['DeviceType']=='plug')] 277. 278. 279. # Isolate the two classes 280. 281. # In[25]: 282. 283. 284. # Run multiclass on just the two device types 285. camplugs_preds, camplugs_metrics, camplugs_cms, camplugs_feature_importa

nce = w.run_multiclass(df_train_camplugs, df_test_camplugs, features_list, ['camera','plug'], use_tuned=False, use_priors=False)

286. 287. 288. # In[26]: 289. 290. 291. camplugs_metrics_df = w.report_metrics(camplugs_metrics, ['camera','plug

'], 'wifi-camplugs_metrics') 292. display(camplugs_metrics_df) 293. 294. 295. # In[27]: 296. 297. 298. w.plot_all_confusion_matrices(camplugs_cms, ['camera','plug']) 299. plt.savefig('Results/CM/wifi-cm-camplug-full.png') 300. 301. 302. # In[28]: 303. 304. 305. w.report_featureimportance(camplugs_feature_importance, features_list) 306. 307. 308. # Show features 309. 310. # In[29]: 311. 312. 313. f, axes = plt.subplots(3, 1, figsize=(6, 14)) 314. sns.countplot(x='DeviceType', hue='Belkin',ax=axes[0], data=df_test_camp

lugs); 315. sns.countplot(x='DeviceType', hue='Dropcam',ax=axes[1], data=df_test_cam

plugs); 316. sns.countplot(x='DeviceType', hue='Assoc_Packets',ax=axes[2], data=df_te

st_camplugs); 317. plt.legend(loc='upper right'); 318. 319. 320. # Use only top 3 features for camera vs plugs classification 321.

129

322. # In[30]: 323. 324. 325. # Run multiclass on just the two device types 326. camplugs_preds_bf, camplugs_metrics_bf, camplugs_cms_bf, camplugs_featur

e_importance_bf= w.run_multiclass(df_train_camplugs, df_test_camplugs, ['Belkin','Dropcam','Assoc_Packets'], ['camera','plug'], use_tuned=False, use_priors=False)

327. 328. 329. # In[31]: 330. 331. 332. camplugs_metrics__bf_df = w.report_metrics(camplugs_metrics_bf, ['camera

','plug'], 'wifi-camplugs_bestfeatures_metrics') 333. display(camplugs_metrics__bf_df) 334. 335. 336. # In[32]: 337. 338. 339. w.plot_all_confusion_matrices(camplugs_cms_bf, ['camera','plug']) 340. plt.savefig('Results/CM/wifi-cm-camplug-best3.png') 341. 342. 343. # In[33]: 344. 345. 346. w.report_featureimportance(camplugs_feature_importance_bf, ['Belkin','Dr

opcam','Assoc_Packets']) 347. 348. 349. # In[34]: 350. 351. 352. print multiclass_end 353. print fs_end 354. print nv_end 355. print time.time() - start_time

130

Appendix F. Classification Analysis Graphs

Figure 44. BLE Door Sensors vs. Temperature Sensors Features

131

Figure 45. Wi-Fi Cameras vs. Plugs Features

132

Bibliography

1. United States Government Accountability Office, “Internet of Things: Enhanced Assessments and Guidance Are Needed to Address Security Risks in DoD,” 2017, [Online]. Available: https://www.gao.gov/assets/690/686203.pdf.

2. S. M. Beyer, “Pattern-of-Life Modeling using Data Leakage in Smart Homes,” Air

Force Institute of Technology, 2018, [Online]. Available: https://scholar.afit.edu/cgi/viewcontent.cgi?article=2793&context=etd.

3. A. Nordrum, “Popular Internet of Things Forecast of 50 Billion Devices by 2020 Is

Outdated,” IEEE Spectrum, 2016. [Online]. Available: https://spectrum.ieee.org/tech-talk/telecom/internet/popular-internet-of-things-forecast-of-50-billion-devices-by-2020-is-outdated. [Accessed: 06-Jun-2018].

4. A. Razaque, P. Oddo, F. H. Amsaad, M. Sangavikar, S. Manchikatla, and Niraj,

“Power reduction for Smart Homes in an Internet of Things framework,” in IEEE International Conference on Electro Information Technology, Grand Forks, North Dakota, USA, 2016, pp. 117–121.

5. M. S. Gast, 802.11 Wireless Networks: The Definitive Guide, 2nd ed. O’Reilly

Media, Inc., 2005. 6. R. Heydon, Bluetooth Low Energy: The Developer’s Handbook. Prentice Hall,

2012. 7. “Bluetooth Core Specifications,” 2018. [Online]. Available:

https://www.bluetooth.com/specifications/bluetooth-core-specification. [Accessed: 06-Apr-2018].

8. K. Townsend, “Introduction to Bluetooth Low Energy - GATT.” [Online].

Available: https://learn.adafruit.com/introduction-to-bluetooth-low-energy/gatt. [Accessed: 19-Jul-2018].

9. Anders Strand, “Bluetooth Low Energy: Central Tutorial,” Nordic Semiconductor,

2016. [Online]. Available: https://devzone.nordicsemi.com/tutorials/b/bluetooth-low-energy/posts/ble-central-tutorial. [Accessed: 06-May-2018].

10. A. L. Samuel, “Some studies in machine learning using the game of checkers,” IBM

Journal of Research and Development, vol. 3, no. 3, p. 210, 1959. 11. A. Géron, Hands-On Machine Learning with Scikit-Learn and TensorFlow:

Concepts, Tools, and Techniques to Build Intelligent Systems, 1st ed. O’Reilly Media, Inc., 2017.

133

12. T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd ed. Springer, 2016.

13. F. Provost and T. Fawcett, Data Science for Business. O’Reilly, 2013. 14. G. James, D. Witten, T. Hastie, and R. Tibshirani, An Introduction to Statistical

Learning, 1st ed. Springer, 2013. 15. A. Padmanabha and C. Williams, “K-nearest Neighbors,” Brilliant.org. [Online].

Available: https://brilliant.org/wiki/k-nearest-neighbors/. [Accessed: 20-Aug-2018].

16. J. Erman, A. Mahanti, and M. Arlitt, “Internet Traffic Identification using Machine

Learning,” in Global Telecommunications Conference, 2006. GLOBECOM ’06. IEEE, San Francisco, California, USA, 2006, pp. 1–6.

17. T. Karagiannis, A. Broido, M. Faloutsos, and K. Claffy, “Transport layer

identification of P2P traffic,” in Proceedings of the 4th ACM SIGCOMM conference on Internet measurement - IMC ’04, Taormina, Sicily, Italy, 2004, p. 121.

18. T. T. T. Nguyen and G. Armitage, “A survey of techniques for internet traffic

classification using machine learning,” Communications Surveys & Tutorials, IEEE, vol. 10, no. 4, pp. 56–76, 2008.

19. B. J. McGregor A., Hall M., Lorier P., “Flow Clustering Using Machine Learning

Techniques.,” in Passive and Active Network Measurement, 2004. 20. L. Bernaille, R. Teixeira, I. Akodkenou, A. Soule, and K. Salamatian, “Traffic

Classification on the Fly,” ACM Special Interest Group on Data Communication (SIGCOMM) Computer Communication Review, vol. 36, no. 2, 2006.

21. J. Erman, A. Mahanti, M. Arlitt, and C. Williamson, “Identifying and

Discriminating Between Web and Peer-To-Peer Traffic in the Network Core,” in WWW ’07: Proceedings of the 16th International Conference on World Wide Web, Banff, Alberta, Canada, pp. 883–892.

22. M. Roughan, S. Sen, O. Spatscheck, and N. Duffield, “Class-of-service mapping for

QoS: A statistical signature-based approach to IP traffic classification,” in Proceedings of the 4th ACM/SIGCOMM on Internet Measurement Conference (IMC), San Diego, California, USA, 2004.

23. A. Moore and D. Zuev, “Internet traffic classification using Bayesian analysis

techniques,” in ACM International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS), Banff, Alberta, Canada, 2005.

134

24. T. Auld, A. W. Moore, and S. F. Gull, “Bayesian neural networks for internet traffic classification.,” IEEE Transactions on Neural Networks, vol. 18, no. 1, pp. 223–239, 2007.

25. J. S. Atkinson, “Your WiFi Is Leaking: Inferring Private User Information Despite

Encryption,” University College London, 2015, [Online]. Available: http://discovery.ucl.ac.uk/1470734/.

26. B. Copos, K. Levitt, M. Bishop, and J. Rowe, “Is Anybody Home? Inferring Activity

from Smart Home Network Traffic,” Proceedings - 2016 IEEE Symposium on Security and Privacy Workshops, SPW 2016, pp. 245–251, 2016.

27. Y. Meidan et al., “ProfilIoT: A machine learning approach for IoT device

identification based on network traffic analysis,” in Proceedings of the ACM Symposium on Applied Computing, Marrakech, Morocco, 2017, pp. 506–509.

28. P. Wang, F. Ye, X. Chen, and Y. Qian, “Datanet: Deep learning based encrypted

network traffic classification in SDN home gateway,” IEEE Access, vol. 6, pp. 55380–55391, 2018.

29. “Plugable Bluetooth Adapter.” [Online]. Available:

https://plugable.com/products/usb-bt4le/. [Accessed: 05-Feb-2019]. 30. “Alfa Indoor WiFi USB Antenna.” [Online]. Available:

https://www.alfa.com.tw/WiFi USB Antenna.html. [Accessed: 05-Feb-2019]. 31. “Ubertooth One.” [Online]. Available:

https://www.wallofsheep.com/collections/lan-taps/products/ubertooth-one-fully-assembled. [Accessed: 05-Feb-2019].

32. Radiocommunication Sector of International Telecommunications Union, “Isolation

between antennas of IMT base stations in the land mobile service,” 2011. [Online]. Available: https://www.itu.int/dms_pub/itu-r/opb/rep/R-REP-M.2244-2011-PDF-E.pdf. [Accessed: 17-Dec-2018].

33. P. Roshan and J. Leary, 802.11 Wireless LAN Fundamentals. Cisco, 2013. 34. “macvendors.co.” [Online]. Available: http://macvendors.co/. [Accessed: 05-Feb-

2019]. 35. “IEEE Standards Association Registration Authority.” [Online]. Available:

https://regauth.standards.ieee.org/standards-ra-web/pub/view.html#registries. [Accessed: 05-Feb-2019].

135

36. “Documentation of scikit-learn 0.20.2.” [Online]. Available: https://scikit-learn.org/stable/documentation.html. [Accessed: 13-Dec-2018].

37. “Jupyter Notebook Documentation.” [Online]. Available: https://jupyter-

notebook.readthedocs.io/en/stable/. [Accessed: 05-Feb-2019]. 38. G. Jurman, S. Riccadonna, and C. Furlanello, “A comparison of MCC and CEN

error measures in multi-class prediction,” PLoS ONE, vol. 7, no. 8, 2012. 39. J. Gorodkin, “Comparing two K-category assignments by a K-category correlation

coefficient,” Computational Biology and Chemistry, vol. 28, no. 5–6, pp. 367–374, 2004.

40. J. Tang, S. Alelyani, and H. Liu, “Feature selection for classification: A review,”

Data Classification: Algorithms and Applications, pp. 37–64, 2014.

Standard Form 298 (Rev. 8/98)

REPORT DOCUMENTATION PAGE

Prescribed by ANSI Std. Z39.18

Form Approved OMB No. 0704-0188

The public reporting burden for this collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources, gathering and maintaining the data needed, and completing and reviewing the collection of information. Send comments regarding this burden estimate or any other aspect of this collection of information, including suggestions for reducing the burden, to Department of Defense, Washington Headquarters Services, Directorate for Information Operations and Reports (0704-0188), 1215 Jefferson Davis Highway, Suite 1204, Arlington, VA 22202-4302. Respondents should be aware that notwithstanding any other provision of law, no person shall be subject to any penalty for failing to comply with a collection of information if it does not display a currently valid OMB control number. PLEASE DO NOT RETURN YOUR FORM TO THE ABOVE ADDRESS. 1. REPORT DATE (DD-MM-YYYY) 2. REPORT TYPE 3. DATES COVERED (From - To)

4. TITLE AND SUBTITLE 5a. CONTRACT NUMBER

5b. GRANT NUMBER

5c. PROGRAM ELEMENT NUMBER

5d. PROJECT NUMBER

5e. TASK NUMBER

5f. WORK UNIT NUMBER

6. AUTHOR(S)

7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES) 8. PERFORMING ORGANIZATIONREPORT NUMBER

9. SPONSORING/MONITORING AGENCY NAME(S) AND ADDRESS(ES) 10. SPONSOR/MONITOR'S ACRONYM(S)

11. SPONSOR/MONITOR'S REPORTNUMBER(S)

12. DISTRIBUTION/AVAILABILITY STATEMENT

13. SUPPLEMENTARY NOTES

14. ABSTRACT

15. SUBJECT TERMS

16. SECURITY CLASSIFICATION OF:a. REPORT b. ABSTRACT c. THIS PAGE

17. LIMITATION OFABSTRACT

18. NUMBEROFPAGES

19a. NAME OF RESPONSIBLE PERSON

19b. TELEPHONE NUMBER (Include area code)

21-03-2019 Master's Thesis Sept 2017 - March 2019

Evaluating Machine Learning Techniques for Smart Home Device Classification

19G437 Aragon, Angelito E., Capt, USAF

Air Force Institute of Technology Graduate School of Engineering and Management (AFIT/EN) 2950 Hobson Way Wright-Patterson AFB OH 45433-7765


Joseph A. Misher Department of Homeland Security Advanced Technology Security Division, Federal Protective Service 800 North Capitol Street NW, Washington D.C. 20001 COMM: 202-658-8806 EMAIL: [email protected]

DHS

Distribution Statement A: Approved for Public Release; Distribution Unlimited

This material is declared a work of the U.S. Government and is not subject to copyright protection in the United States.

Smart devices in the internet of things have transformed the management of personal and industrial spaces. Recent research has shown that it is possible to model a subject’s pattern-of-life through Bluetooth Low Energy (BLE) and Wi-Fi smart device data leakage. A key step is the identification of the device types within the smart home. This research hypothesizes that machine learning algorithms can be used to accurately perform the classification of smart home devices. A smart home environment was built using various BLE and Wi-Fi devices to create realistic traffic for machine learning classification. A device classification pipeline was designed to collect traffic and extract features. K-nearest neighbors , linear discriminant analysis, and random forests classifiers were built and tuned for experimental testing. Performance was evaluated using the Matthews correlation coefficient, mean recall, and mean precision metrics. Experimental results provide support towards the hypothesis that machine learning can classify device types to a high level of performance, but more work is necessary to build a more robust classifier.

Internet of Things, Machine Learning, Classification, Smart Home, IoT

U U U UU 151

Dr. Barry E. Mullins, AFIT/ENG

(937) 255-3636 x7979 [email protected]

Reset

Documents

EVALUATING MACHINE LEARNING TECHNIQUES FOR SMART … · The Internet of Things (IoT) is expected to have a network of over 31 billion devices by 2020 . IoT devices have the potential