34
Implementation of a Resource-Aware, Data Clustering Algorithm for the Sun SPOT Distributed Query Processor (SSDQP) Quincy Chi Kwan Tse (200310185) 100% contribution Supervisor: Uwe Röhm Submitted: 28 May 07

Implementation of a Resource-Aware, Data Clustering ...web.it.usyd.edu.au/~wsn/projects/ssdqp-cluster/report.pdf1 1 Introduction 1.1 Background This project is part of the research

  • Upload
    vukhanh

  • View
    216

  • Download
    2

Embed Size (px)

Citation preview

Implementation of a Resource-Aware, Data

Clustering Algorithm for the Sun SPOT

Distributed Query Processor (SSDQP)

Quincy Chi Kwan Tse (200310185) – 100%

contribution

Supervisor: Uwe Röhm

Submitted: 28 May 07

i

Title:

Implementation of a resource-aware, data clustering algorithm for the Sun SPOT Distributed

Query Processor

Author:

Quincy Tse

School of Information Technologies

Faculty of Engineering and Information Technologies

The University of Sydney

Australia

Abstract Wireless sensor network (WSN) nodes have very limited battery life. Therefore a current topic

in the research of WSN is to extend their operational life for as long as possible. In this

project, a resource-aware algorithm for clustering (aggregating) streaming data collected by

had been implemented into the School of Information Technologies’ WSN research project,

the Sun SPOT Distributed Query Processor (SSDQP). As part of this extension, a new

framework had been developed in order to cater for future implementations of different data

stream processing algorithms into SSDQP. A new mechanism was also developed to support

any resource-aware components implemented into SSDQP.

Acknowledgements I would like to take this opportunity to thank my supervisor for guiding me through the

project and keeping me on track. Without his guidance, I would probably have drifted to work

on some related but not assessable problem, and would never have got this finished.

ii

Table of Contents 1 Introduction..........................................................................................................................................1 1.1 Background...................................................................................................................................1 1.2 Related Work ...............................................................................................................................2 1.3 Organisation of the report .........................................................................................................3 2 System Design and Implementation..................................................................................................4 2.1 Original Design of SSDQP ..........................................................................................................4 2.2 System Design Extensions..........................................................................................................6 2.3 System Implementation .............................................................................................................7 3 Background on Tools and Concepts............................................................................................... 12 3.1 Artifacts used............................................................................................................................. 12 3.2 Coding Methods ........................................................................................................................ 13 3.3 Evaluation Methods ................................................................................................................. 14 4 User Operations ................................................................................................................................ 15 4.1 User Operations ........................................................................................................................ 15 4.2 Operational Scenarios .............................................................................................................. 16 5 Evaluation .......................................................................................................................................... 21 5.1 Tests for Correctness ............................................................................................................... 21 5.2 Tests for Performance............................................................................................................. 23 5.3 Limits on the Extension.......................................................................................................... 23 5.4 Evaluation of the Extension ................................................................................................... 24 6 Individual Reflections....................................................................................................................... 25 7 Conclusion ......................................................................................................................................... 27 8 Future Work ...................................................................................................................................... 28 9 References.......................................................................................................................................... 30

Table of Figures Figure 1 The version of Sun SPOT used (bSPOT) ...................................................................................1 Figure 2 The newer version of Sun SPOT (eSPOT) ................................................................................1 Figure 3 SSDQP Design (taken from [8]) .................................................................................................4 Figure 4 SSDQP Extensions (green – new addition, blue – modification) ............................................1 Figure 5 ERA Clustering algorithm ............................................................................................................1 Figure 6 Components of Sun SPOT (eSPOT) [13] .............................................................................. 12

List of Tables Table 1 BNF grammar definition for SSDQP query language at the start of the project...................1 Table 2 SSDQP Language extensions – the rest of Table 1 remained unmodified .............................8 Table 3 Specifications of the Sun SPOT (taken from [13]) ............................................................... 12 Table 4 Sample compiler results ............................................................................................................. 22

1

1 Introduction

1.1 Background

This project is part of the research conducted by the School of Information Technologies on

data processing in wireless sensor network. The objective for this project is to develop an

extension to the Sun SPOT Distributed Query Processor (SSDQP) in order to evaluate the

performance of the Enhanced Resource Aware (ERA) clustering algorithm proposed by Gaber

et al [4] in life operation of a typical wireless sensor network.

In implementing the ERA clustering algorithm, a number of supporting infrastructures have

been developed. These supporting infrastructures provide the capability to sense available

resources (CPU utilisation, free memory and remaining battery level) as well as to retrieve the

instantaneous state of an ERA cluster.

1.1.1 Wireless Sensor Networks

A wireless sensor network (WSN) is group of wirelessly connected, battery operated embedded

devices equipped with one or more sensors. Unlike other wireless networks, WSN is designed to

contain a large number of nodes that operate without operator intervention. Typically, these

devices are various types of environmental sensors capable of performing some calculations,

and are able to forward the measurements to a central location for further processing.

WSNs are relatively cheap to produce and deploy, and are typically used to monitor a large

area. Examples of WSN applications include collecting tidal data on a coastline, monitoring the

health of the Murray River and monitoring the stress on dam walls.

Due to the battery-operated nature of WSN nodes, resources at each node are very constrained.

Nodes have a very limited battery life. Consequently, the nodes’ processing power, radio

interface as well as memory have also been designed to be limited in order to conserve battery.

In addition, the radio interface was unreliable due to low transmission power and internal

antennae.

1.1.2 Enhanced Resource Aware Clustering

Studies [11] have identified that radio communication is the most energy-expensive operation

on a sensor node. The power used in transmitting one bit is enough to perform up to 1000 of

instructions [12]. In order to reduce the amount of data transmitting across the network, the

Enhanced Resource Aware (ERA) clustering algorithm [3], [4] had been designed to summarise

sensor data for storage and transmission.

The clustering algorithm in the ERA cluster summarises the sensed data by attempting to put

each measurement into an appropriate cluster. The most appropriate cluster is obtained by

selecting the cluster whose Manhattan distance to the measurement is the smallest and is below

a specified threshold. The use of the Manhattan distance allowed the clustering algorithm to

operate multi-dimensionally.

Since measurements on the environment tend to have quite a large temporal redundancy [3],

by locally summarising the measurements to value ranges (represented by the mean of the

measured values) and their respective frequencies (the number of measurements in each

“bucket”) over a period of time, the storage and transmission requirements can be greatly

reduced.

2

Figure 1 The version of Sun SPOT used

(bSPOT)

In addition, the ERA clustering algorithm also automatically adapts the processing to the

available resource level. Whenever the amount of free memory falls below a preset threshold,

the algorithm will increase the radius of each cluster (thus the range of values each cluster

contains) in order to discourage the formation of new clusters. It will also remove inactive and

outlier clusters.

1.1.3 Sun SPOT Distributed Query Processor

In addition to minimising energy use, flexibility and usability of a WSN is also important for

some applications. The Sun SPOT Distributed Query Processor (SSDQP) was developed to

address the usability of the WSN by providing a declarative query interface using a SQL-like

language to provide data abstraction. SSDQP also provides time synchronisation, in-network

processing, a remote multi-client GUI as well as a scheduling mechanism to enable multi-

tasking support [11].

Data abstraction allows users to easily specify the data to be gathered without the need to

instruct the WSN how to execute the query. The user specifies the data to be retrieved by

entering a SQL-like query into a GUI. The GUI then forwards the request to the host program

through a standard TCP/IP socket connection. This query is then parsed and compiled into a

high-level execution plan. Through the base station node connected to the host machine, this

execution plan is disseminated into the network. Each node then decides and schedules the

actual tasks to perform based on the received plan. The time-triggered scheduler in each node

will execute the tasks as scheduled and recurrent tasks will be rescheduled to run on the next

specified time.

By separating the GUI and the host program and communicating through TCP/IP, SSDQP

allows multiple users to access WSN resources remotely and concurrently. In addition, by

making the decision on the action to perform at the node, it allows for WSN with nodes that

have different capabilities and simplifies the deployment of new nodes into a WSN. Multi-

tasking is supported by compiling queries into tasks and scheduling them into the scheduler.

Multiple queries are compiled into separate tasks, allowing multiple queries to be issued to the

node concurrently.

SSDQP runs on WSN nodes called Sun SPOT, which are Java VM based devices. These devices

run Java natively, allowing programs written in the high-level language to be executed

relatively efficiently. The version of Sun SPOT used in SSDQP is the bSPOT (Figure 1). It is

intended that the code be ported across to the newer hardware called eSPOT (Figure 2) by July.

Further information on these devices is contained in section 3.1.1.2.

1.2 Related Work

There are a number of other research groups that are currently investigating different

approaches to minimise communications to increase the life of the sensor nodes. The concept

of resource aware clustering actually originated from Teng et al’s paper [10]. Other in-network

Figure 2 The newer version of Sun SPOT

(eSPOT)

3

clustering techniques like HEED [9] have also been developed. The majority of these proposals

are implemented on minimal sensor devices like MOTEs, which lacks the higher level

abstraction and libraries that are available in a virtual-machine based system.

In their paper, Madden et al [12] presented TinyDB, which is considered as the “gold standard”

in the field. TinyDB also uses an SQL-like language for declarative querying. The query is

parsed, compiled and optimised for overall power consumption, and are disseminated into the

network in binary format. Queries in TinyDB may iterate until a STOP QUERY command is

received by the node. TinyDB also provides event-based and life-time based queries in order to

minimise unnecessary transmission of intermediate results. Tiny DB provides stream

processing capabilities including sliding window aggregate as well as claiming to provide user-

defined aggregates. Resource optimisations in Tiny DB are primarily provided at the host when

compiling the query into the execution plan. In contrast, SSDQP concentrates on providing

dynamic resource adaptation on each node at runtime.

Mueller et al [1] have developed SwissQM, which utilise in-network aggregation to reduce data

sent. SwissQM essentially extends a Java opcode-based virtual machine with sensor and query

specific op-codes, running on top of TinyOS to provide higher level abstraction support.

SwissQM’s use of in-network clustering reduces data by eliminating spatial redundancies. It also

provides the ability for the nodes to accept macro-like programs in order to provide extra data

manipulation functionalities like aggregation operators. SwissQM compiles the query at the

host into VM code before disseminating to nodes. A time-trigger mechanism is not currently

supported.

In contrast, the ERA clustering algorithm aggregates the data collected in the node over a

period of time, thus exploiting the temporal redundancies that are inherent in the collected

data. The SSDQP extensions produced in this project also use a different concept to implement

new data manipulation functionalities. Instead of defining a new function that implements the

algorithm, it defines a special data storage container that implements the same interface as the

other data containers. This new container applies the aggregation algorithm either as new data

is added or as the stored data is retrieved (depending on the actual implementation). This allows

the intermediate states of the algorithm to be internalised, thus supporting multiple concurrent

queries that uses the same algorithm. Spatial redundancies can still be exploited by

implementing any other cluster-based routing schemes underneath the clustering application.

SSDQP also provides dynamic resource adaptation within the nodes through the use of a

resource monitor, which can alert resource-aware algorithms as resource level changes. In

addition, by implementing a task scheduler within SSDQP, multiple queries may be running on

the node concurrently, allowing the network to be used for multiple applications concurrently.

1.3 Organisation of the report

This report is divided into 8 chapters. The first chapter provides an introduction and the

background to the task, followed by a chapter about the background concepts and the tools used

in this project. Chapter 2 gives a general overview of the design of the extensions. The report

then provides a background on the tools used and concepts introduced in the system. The user

operations and usage scenarios are contained in Chapter 4 while a summary of the testing

procedures and the tests conducted are stated in Chapter 5. Chapter 6 presents a reflection on

the project, followed by the conclusion in Chapter 7. A suggestion on the future direction of

the project is presented in Chapter 8.

4

2 System Design and Implementation

2.1 Original Design of SSDQP

Initially, SSDQP contains 7 modules – Communication, Routing, Networking, Global Time

Synchronisation, Grammar, Sensing and the Scheduler modules [8]. Other than Communication

and Routing, all modules interact with the Scheduler for task scheduling. Most functions

performed in a node are packaged into “TaskSets”, which arrange the actions required

(“Tasks”) into an execution tree. New functionalities can be added by extending an appropriate

Task.

Figure 3 SSDQP Design (taken from [8])

2.1.1 Communication, Time Sync, Routing and Networking Modules

Together, these four modules handle all matters relating to the transmission and reception of

messages over the radio interface. Communication module deals with the various API calls in

order to receive and transmit messages, whereas the Routing module contains the routing

algorithm. The Networking module packages calls to these modules into Tasks and TaskSets

for scheduling.

Global Time Synchronisation module provides the singleton TimeKeeper class, allowing the

system clock to be adjusted, as well as the necessary TaskSets for processing time

synchronisation messages.

2.1.2 Grammar Module

The Grammar module is using recursive descent parsing. Two branches of the Grammar module

are used in the project – one branch for translating SQL queries into compact Tokens for

network transmission as well as defining the query plan, the other parses the Tokens on the

node and assembles a TaskSet for executing the specified query. The original query language is

defined in Table 1.

The current implementation of SSDQP uses a two-step process to interpret an SQL query into

a list of Tasks for the nodes to do. Step 1 is to rewrite the verbose SQL query into a shorter

network Tokens using the SQLParser and TokenCompiler classes in order to minimise network

transmission. Attribute names are translated to their numeric identifier in the host computer to

minimise communication cost. Step 2 is then to interpret the received Token string and

sequences a list of Tasks that are required to execute the query.

5

In order to extend the query language, new identifier token needs to be defined, as well as the

usage rules regarding the new syntax. Corresponding network Tokens and their syntax also

need to be defined. The parse is then updated to translate the new identifiers into the new

network tokens. New Tasks are also required to be defined in order to provide the

functionalities, and the parser on the node is then updated to produce the new execution tree.

2.1.3 Sensing Module

The sensing module deals with the acquisition and processing of sensor data. Not only does this

module contain code to access the node’s sensors, it also contains all the code needed for result

table operations.

For each sensor in the node, a SensorTask is created to activate and retrieve information from

that sensor. In order to allow a new sensor to be used in SSDQP, a new SensorTask needs to be

defined, with the eval() method overridden to return the sensor value. Since the sense() method

in SenseManager activates each sensor and arranges the values in a result Table, it also needs to

be updated such that it will activate the new sensor.

Result table operations are implemented in the “SSDQP Relational Database (srdb)” sub-

module. It defines the basic data container – the Table as well as a number of Tasks that

perform the operations.

<statement> ::= <maintenance> | <query> <maintenance> ::= <kill> | <sync> | <route> <kill> ::= “KILL” <number> <sync> ::= “SYNC” [<number>] <route> ::= “ROUTE” { <number> } <query> ::= “SELECT” <field> { “, ” <field> }

[ “WHERE” <condition> ] [ “GROUPBY” <attribute> ] [ “START” <starttime> ] [ “EPOCH” <verbosetime> ] [ “RUNCOUNT” <number> ]

<field> ::= <attribute> | <aggregation> <attribute> ::= “node” | “time” | “x” | “y” | “z” | “ temp” | “ light” | “sw1” | “ sw2” | “*” <aggregation> ::= <aggregator> “(” <attribute> “)” <aggregator> ::= “SUM” | “AVG” | “MAX”| “MIN” | “COUNT” <condition> ::= <comparison> [ <booleanop> <condition> ] <comparison> ::= <comparisonexpr> | <conditiongroup> <comparisonexpr> ::= <numericexpr> <comparator> <numericexpr> <conditiongroup> ::= “(” <condition> “)” <booleanop> ::= “AND” | “OR” <numericexpr> ::= <numericconst> { <numfunct> <numericconst> } <numericconst> ::= <field> | <number> <comparator> ::= “<” | “<=” | “=” | “>=” | “>” | “!=” <numfunct> ::= “*” | “/” | “+” | “-” <number> ::= any integer in the range !263 to 263 ! 1 <verbosetime> ::= <number> <timeunit> [ <verbosetime> ] <timeunit> ::= “hour” | “hours” | “h” | “minute” | “minutes” | “m” | “second” | “seconds” | “s” <starttime> ::= <fixedtime> | <relativetime> <fixedtime> ::= “AT” <statictime> <statictime> ::= <number> “:” <number> “:” <number> “ ” [ <date> ] <date> ::= <number> “.” <number> “.” <number> <relativetime> ::= “IN” ( <verbosetime> | <statictime> )

Table 0 BNF grammar definition for SSDQP query language at the start of the project

6

2.2 System Design Extensions

This project extends the existing SSDQP by incorporating the ERA clustering capability into

SSDQP. This extension is achieved by:

• Implementing “internal sensors” in order to provide a generic method of retrieving resource level. This is achieved by extending the Sensing module.

• Implementing a mechanism that monitors the current resource level and notifies any resource-aware components of any changes that have been detected. The Resource Monitor module was introduced into SSDQP to provide this functionality.

• Providing a generic mechanism to store intermediate results (buffered, or aggregated by some algorithm like ERA cluster), provided by the new Storage Point sub-module.

• Implementing the ERA clustering algorithm as prototyped. This is achieved by the new Clustering sub-module.

• Extending the query language in order for these new features to be used by the operator by

amending the Grammar module.

In addition, the start up routine was also updated in order for the Resource Monitoring module

to be started automatically.

2.2.1 Grammar Module

Amendments to the Grammar module are necessary in order to enable the new features to be

used in the query language. In this project, the existing module was extended to allow users to:

• specify any of the three new internal sensors (CPU idle time, free memory and battery level) to be used,

• specify whether to return the sensed data to the base station or to buffer them locally, • specify whether the buffered data should be summarised using the clustering algorithm, and • specify the source of the data to retrieve from (sensors or the buffered readings)

2.2.2 Sensing Module

The Sensing module was extended by adding new types of SensorTask to retrieve the current

battery level, free memory and the CPU utilisation. These tasks are invoked as the leaf node in

the recursive descent execution tree.

The Sensing module was also extended to allow buffers to be “sensed”. Future extensions to this

module (in conjunction with the Storage Point module) should create a new “Sensor” table, thus

removing this decision of where to sense for data by implicitly deciding the data source through

polymorphism.

Figure 4 SSDQP Extensions

(green – new addition, blue – modification)

7

2.2.3 Resource Monitor Module

The Resource Monitor module was added to the existing SSDQP project mainly for the use of

the resource-aware clustering algorithm that was integrated into this system in this project.

However, the module interface uses the well-known publisher-subscriber relationship,

implemented using Java call-back methods. This allows any other part of the SSDQP to

become resource-aware by subscribing to the Resource Monitor.

In order to reduce code redundancy and to fully utilise the scheduler for task scheduling, the

Resource Monitor module is designed as two separate parts: a Singleton Resource Table which is

a single-row Table that notifies all its subscribers whenever it is updated, and a periodic query

that sense the current resource level, and updates the Resource Table.

While these two parts are extremely simple to code (ResourceTable is a standard publisher in a

publisher-subscriber relationship, and the recurring query required no extra code – just a Token

string for the query that the Grammar module can easily decipher), they provide a trigger that

allows advanced resource-aware algorithms to be implemented into the various components of

the nodes.

2.2.4 Storage Point Sub-module

The Storage Point sub-module of the sensing module contains the code to define and retrieve

specific storage points for use as a buffer (clustering or otherwise).

This module contains only 1 class – StoragePoint. The StoragePoint class is a singleton class.

It basically creates storage locations (“pigeon holes”) for data storage. Each storage location is

identified internally by a location number.

The addLocation() method in the StoragePoint creates a new storage location to store data,

while the getLocation() method retrieves the Table currently stored in the specified location.

This Storage Point module is designed to be able to store any type of data Table. Aggregation

and other stream processes can be simplified by implementing them as a subtype of Table.

Using StoragePoint to store its intermediate states between invocations, the developer need

not worry about how these processes can be scheduled into the SSDQP.

2.2.5 Clustering Sub-module

This module contains all necessary code to perform the functions of the ERA clustering as

described in [2], [3].

This module contains 3 classes, of which only one (ERA_Cluster) is used externally by the rest

of the program. The ERA_Cluster is adapted from the original ERA_Cluster prototype, and

was made to become a type of data Table (which is used throughout the rest of SSDQP) by

implementing extra adapter methods. This allows the class to be stored in the Storage Point

module such that it is easy to access. This class enables resource awareness by registering itself

as a subscriber to the Resource Monitor module.

In order to use this module, a new Task was created in the Grammar module that will cause a

new cluster to be created when a clustering query is executed.

2.3 System Implementation

It should be noted that the modules described in this document does not directly corresponds to

the actual Java packages in the implementation. This document identifies the conceptual

modules is SSDQP and is in line with the original design as described in the original

documentation [8]. These modules may be separated into two or more Java packages in the

8

implementation of both the SSDQP and the extensions. This allows the programmers to

further restrict the namespace of the classes when implementing the design.

2.3.1 Grammar Module

Table 1 identifies the extensions made to the SSDQP query language to enable the new features

to be used.

<query> ::= “SELECT” <field> { “, ” <field> } [FROM <location>]

[INTO <location> [CLUSTER <radius>]] [ “WHERE” <condition> ]

[ “GROUPBY” <attribute> ] [ “START” <starttime> ]

[ “EPOCH” <verbosetime> ] [ “RUNCOUNT” <number> ]

<field> ::= <attribute> | <aggregation> <attribute> ::= “node” | “time” | “x” | “y” | “z” | “temp” | “light” | “sw1” | “sw2” | “battery” | “cpu” | “memory” | “*”

<location> ::= any alphanumeric string <radius> ::= <number>

Table 1 SSDQP Language extensions – the rest of Table 1 remained unmodified

To implement this change, step 1 of the query parsing process (translating to network Tokens)

was extended by first extending the language defining the Tokens. The Parser and the Compiler

were then extended to recognise the new syntax by adding new methods that deals with the new

syntax, and calling these new methods from the main Parser method.

Step 2 of the query parsing process (constructing an execution tree from network Tokens) was

extended by first creating the new Tasks that are required (for example, the Task to write data

to into a buffer, which was used in lieu of forwarding the results back to base station). The

TokenParser was then updated to recognise the new Token syntax and creates the new Tasks

as necessary. The use of the Composite design pattern and the recursive descent nature of the

task execution greatly simplified the amendments in this module.

2.3.2 Storage Point Sub-module

In order to conserve memory use and the amount of data transmitted across the network, the

Storage Point was designed to use integer identifiers to internally identify each storage point.

Currently the Storage Point is implemented as a hash table stored in RAM, mapping the

identifier to a type of the data Table. Internal implementation of the data Table is irrelevant

as it uses only the interface, thus allowing any specialised data Tables like the ERA Cluster to

be stored in addition to the standard buffer Tables. Future extensions to this module can easily

add support for the Storage Point to save its content to flash memory in order to protect

against any random node restarts.

However, the use of numeric identifier by users is both unintuitive and error prone. Therefore

the SQL query language uses an alphanumeric string to identify storage points.

In order for this to operate, a translator is required to translate the alphanumeric identifier to a

numeric identifier. The implementation of such a translator is simple and is based on HashMap.

There are two places where this translator could be placed – either in the host machine or on

the client. Since multiple clients may connect to the same host, identifier clashes may occur if

the translator is place in the clients. Therefore, the translator is placed in the host application

along with the translator for attribute names.

In the currently implementation, this translator is only stored in the host computer’s memory

and is therefore not persistent. In order to allow the host to be restarted without losing these

critical metadata, it is advised that future extensions to SSDQP enable this metadata table to be

9

dumped into persistent storage such that these metadata can be restored in the event of host

application restarting.

2.3.3 Sensing Module

The Sensing module was extended by adding three new SensorTasks to retrieve the resource

levels – CPUTask (CPU idle time), MemoryTask (free memory) and BatteryTask (remaining

battery level). Since the bSPOT hardware only support the calculation of free memory through

the standard JVM status methods, only MemoryTask was fully implemented. The remaining

Tasks are inserted into the system as dummy classes.

In order to determine whether the task require the activation of sensors or just reading from

buffer, the main sense() method was updated. The sense() method now defaults to retrieving

from buffer, unless the storage location indicates that it should be retrieved from the actual

sensors. This is an interim solution as a more elegant implementation is to implement the

sensing operation inside a “SensorTable” stored within the storage point. This allows the

program to implicitly perform the correct operation without checking the intention of the

query.

2.3.4 Resource Monitor Module

The Resource Monitor module was implemented as specified in section 2.2.3. In addition, a

new type of Task was also added to the Scheduler such that the run count (number of iterations

before terminating the task) is not decremented. This allows the recurrent monitoring task to

be able to run indefinitely.

2.3.5 Clustering Sub-module

As described in sections 1.1.2 and 2.2.5, the Clustering sub-module implements the ERA

Clustering algorithm in order to summarise measurements to value ranges and their frequencies.

Figure 5 describes the operation of the algorithm as implemented in the Clustering sub-module.

When a new measurement is to be clustered, the algorithm searches through the set of all

clusters, calculating the Manhattan distance between the mean of each cluster and the new

measurement. If the measured values are within the specified threshold radius from the closest

cluster, then the measurement is added to that cluster by updating the mean of the cluster and

incrementing the count. If no such clusters are found, then a new cluster is established centred

at the measurement with the count set to 1. The use of Manhattan distance allows the

algorithm to apply in a multi-dimensional case, making the algorithm much more generic.

Resource adaptation in this sub-module is implemented by varying the “processing granularity”

(accuracy of the aggregation operation) and the “output granularity” (size – hence the

precision – of the clusters) as described in [4].

Figure 5 ERA Clustering algorithm

10

In the event of high CPU utilisation, instead of searching through the entire set of clusters,

only a random subset of clusters, which is a specified proportion of the entire set, is searched.

This reduces the load on the CPU, but may cause extra clusters to be formed unnecessarily.

In the event of low free memory level, the algorithm attempt to reduce the size of the set of

clusters. The algorithm iterates through the entire set to locate and remove outlier clusters as

well as clusters that have not been used for at least a specified amount of time. The radii of the

clusters are also increased up to an upper bound, thus decreasing precision of the algorithm.

When the resource levels return back to the normal range (below the threshold level), these

adaptation are cancelled and the algorithm will return to operating normally.

However, the “input granularity” (sampling period) adaptation to low battery level was not

implemented in the ERA cluster. This is a deliberate decision due to the fact that the cluster,

conceptually being a data container, should not be able to control the sampling period of a

query. The input granularity adaptation should therefore be implemented in the scheduler or in

the actual queries instead.

New Values

Has more

clusters? Calculate distance

Determine clusters with

minimum distance

Distance

within

threshold?

Randomly add to one

of these clusters

Add values to a new

cluster

Memory adaptation:

• Increase radius, reclaim unused clusters

CPU adaptation:

• Restrict search to a

subset of clusters

Yes

No

Yes No

11

2.3.6 Start-up Routine

The start-up routine that is executed by the sensor node when it is switch on was amended such

that the resource monitoring module is started at the first opportunity (as soon as the node is

synchronised with the network, which allows the exact system time to be determined for

scheduling purposes).

12

3 Background on Tools and Concepts

3.1 Artifacts used

3.1.1 Hardware

3.1.1.1 Development Platform

The hardware used in this project includes an Acer Aspire 3627WXMi laptop (1 GB DDR2

memory, 1.7GHz processor – can only run at 1.3GHz maximum, running Windows Vista

Business Edition). This computer is used due to budget constraints – the author cannot secure

funding from the School of Information Technologies to purchase a new personal computer

purely for this project.

3.1.1.2 Wireless Sensor Node

The sensor node used in this project is the beta-release Sun SPOT (bSPOT) (See Figure 1). This

node is used because this is the node the School of Information Technologies is currently

conducting research on. The most notable feature of this sensor node is that it runs Java as its

native programming language – “runs Java on bare metal”. This allows sophisticated algorithms

to be implemented on the sensors nodes easily. The specifications for a Sun SPOT device are

listed in Table 2 Specifications of the Sun SPOT.

Figure 6 Components of Sun SPOT (eSPOT) [13]

Sun SPOT Processor Board

• 180 MHz 32 bit ARM920T core - 512K RAM/4M Flash

• 2.4 GHz IEEE 802.15.4 radio with integrated antenna

• USB interface

• 3.7V rechargeable 720 mAh lithium-ion battery

• 32 uA deep sleep mode

General Purpose Sensor Board

• 2G/6G 3-axis accelerometer

• Temperature sensor

• Light sensor

• 8 tri-color LEDs

• 6 analog inputs

• 2 momentary switches

• 5 general purpose I/O pins and 4 high current output pins

Squawk Virtual Machine

• Fully capable J2ME CLDC 1.1 Java VM with OS

functionality

• VM executes directly out of flash memory

• Device drivers written in Java

• Automatic battery management

Developer Tools

• Use standard IDEs. e.g. NetBeans, to create Java code

• Integrates with J2SE applications

• Sun SPOT wired via USB to a computer acts as a base-station

Table 2 Specifications of the Sun SPOT (taken from [13])

3.1.2 Compilers and Language Tools

3.1.2.1 Programming Language - Java

SSDQP and its extensions are written entirely in Java. Java is used due to the native support for

the language by the sensor node. This greatly simplifies the development of new code for the

sensor nodes.

3.1.2.2 Java Compiler

The standard Java SE 1.60_01-b06 compiler was used to compile the project. This compiler

was used because it is the most current compiler for Java, and it maintains good backward

13

compatibility with previous versions of Java. The improved GUI API in Java SE 6.0 was also

used in the existing SSDQP project, and thus is a requirement in order to compile the existing

SSDQP code.

When this compiler is used to compile the code for deployment into the Sun SPOTs, the

source level” of the compiler is restricted to Java SE1.3 in order to maintain compatibility with

the Java (Squawk) Runtime used in the Sun SPOTs.

3.1.2.3 Sun SPOT Libraries and Deployment Tools

The Java (Squawk) library and the Sun SPOT libraries were used in this project as they are

required for compiling code that can be run on Sun SPOTs. The Sun SPOT libraries also allow

the software to interface with the Sun SPOT hardware on each node.

Ant is used both to compile, “suite” (collect all required class files and components of jar files

into an image file), deploy (write the image file to the node) as well as running the host

(computer-based base station logic), base station (a “tap” into the radio interface) and the

nodes. Ant is used as it is the default tool used by Sun SPOT, and it is easily configurable to the

project’s needs.

3.1.2.4 Source Code Repository

Subversion (SVN) was used to store the source code. A separate branch for this project was

created from the main SSDQP trunk in order to allow independent, concurrent development on

the SSDQP. SVN was used because it is a very common, fully featured source code versioning

tool, with many open-source add-ons to common IDEs.

3.1.2.5 Integrated Development Environment

Eclipse was used to write the source code for the project. Eclipse was used due to it being freely

available, and the author’s familiarity with the tool. Easy access to the source code repository

was enabled by using an open source add-on called Subclipse. This allows the repository to be

manipulated without escaping from the IDE.

3.2 Coding Methods

3.2.1 Design Approach

This project was developed using an object-oriented approach. This approach is used because

the chosen Wireless Sensor Node uses Java as its native language, and Java is designed for object

oriented programming. In addition, the polymorphism feature in object oriented design also

made the recursive descent parsing used extensively in the SSDQP easier. Polymorphism also

simplifies the coding required to extend the SSDQP.

3.2.2 Design Pattern

The Singleton pattern is used extensively through the project in order to minimise overhead in

creating new objects. Many classes used in this project require only one instance throughout the

process’s lifetime.

The Adapter is used in this project in order to couple the existing clustering prototype with

SSQDP. By using an adapter instead of implementing the algorithm from scratch, it minimises

the number errors that may occur from the implementation. This pattern allows the author to

isolate the problem, if exists, to the original implementation of the clustering algorithm.

The Composite design pattern is use primarily for the recursive descent parsing. It allows the

task to be decoupled into different stages for execution, simplifying the code.

14

The Command pattern is also used in this project, and is used as part of the parsing tree to

execute certain standard actions as required by the parsed query. This allows new actions to be

added into the SSDQP easily.

3.3 Evaluation Methods

In order to verify the software, tests are conducted at several level using different strategies.

Automated unit testing using JUnit had been used throughout the development in the project.

This allows code segments that are not performing in accordance with the specifications to be

detected early in the development stage and before deployment.

Scripted test cases were also used to test features of the program, sometimes requiring operator

intervention. In cases where the code being tested makes references to the configurations of

the node, which are unavailable in the desktop development platform, operator intervention is

required to skip certain lines of code. Commenting out the line and running the script would be

inappropriate as it may leave the system vulnerable to bugs if some commented out lines have

not been uncommented before deployment.

Human tests are used to test the operation of the program after the code had been deployed to

the nodes. This is required because the code is being run remotely and is not accessible by

debuggers. In order to test the implemented features, test queries are specially crafted to test

specific features both in isolation and in conjunction with other features.

Tests have not been conducted for the multi-hop scenarios (when a message is required to be

routed through intermediate nodes) due to the range required to sufficiently separate the nodes

to necessitate the multi-hop routing.

15

4 User Operations

4.1 User Operations

4.1.1 Returning the current battery level as a special type of “internal sensor”

4.1.1.1 Inputs

Specify the keyword “BATTERY” as part of the SELECT clause of the SQL statement:

SELECT BATTERY FROM sensors

Note that the “FROM sensor” clause is optional as it is the default location to retrieve data.

4.1.1.2 Expected Outcomes

The battery level of each sensor as a single column table

4.1.2 Returning the current amount of free memory as a special type of “internal

sensor”

4.1.2.1 Inputs

Specify the keyword “MEMORY” as part of the SELECT clause of the SQL statement:

SELECT MEMORY FROM sensors

Note that the “FROM sensor” clause is optional as it is the default location to retrieve data.

4.1.2.2 Expected Outcomes

The free memory of each sensor as a single column table

4.1.3 Returning the current CPU util isation as a special type of “internal sensor”

4.1.3.1 Inputs

Specify the keyword “CPU” as part of the SELECT clause of the SQL statement:

SELECT CPU FROM sensors

Note that the “FROM sensor” clause is optional as it is the default location to retrieve data.

4.1.3.2 Expected Outcomes

The CPU utilisation of each sensor as a single column table

4.1.4 Specifying the node to buffer the measured values instead of forward the

results

4.1.4.1 Inputs

Specify a buffer location to store the result as part of the query:

SELECT * INTO somewhere

4.1.4.2 Expected Outcomes

Nothing – Results should be buffered locally on each node instead of being returned to the user.

16

4.1.5 Specifying the node to retrieve data stored in a storage point (buffer or cluster)

4.1.5.1 Inputs

Submit a query to retrieve the data from the buffer:

SELECT * FROM somewhere

4.1.5.2 Expected Outcomes

The current content stored in the storage point.

4.1.6 Specify the node to return a clustered summary of the measured values

4.1.6.1 Inputs

The user specifies a clustering buffer to aggregate measured values, then retrieve data from the

clustering buffer:

SELECT * INTO somewhere CLUSTER <radius> (eg. SELECT * INTO somewhere

CLUSTER 5)

SELECT * FROM somewhere

4.1.6.2 Expected Outcomes

The summary of the sensor readings in a table containing the mean of each attribute for each

cluster, together with the number of elements in that cluster. A cluster is defined as a group of

measurements that are within a specified n-dimensional radius from the mean value as

calculated using the Manhattan distance formula.

4.1.7 Implement resource monitoring

4.1.7.1 Inputs

Nothing.

4.1.7.2 Expected Outcomes

Whenever the resource levels satisfy some critical conditions, the user may notice a change in

node performance in order to accommodate for the node condition. Refer to section 2.3.5 for

details on the adaptation implemented by the clustering algorithm.

4.2 Operational Scenarios

4.2.1 SSDQP setup (required before any requests may be made into the network)

Note: This guideline assumes that the relevant code had been compiled and deployed.

1. Attach the base station node to a free USB port on the base station.

2.

17

Start the base station node by typing “ant host-run” in a command prompt opened in the

SSDQP root directory

3.

Open another command prompt. Start the GUI by changing into the GUI directory and

type “ant run”.

4.

18

Connect to the base station node by setting the correct configuration and clicking

“Connect”.

5. Run the nodes by switching on the nodes. When the nodes start up, they will

automatically start the SSDQP.

6.

Build the routing tree by clicking “Build routing tree” and selecting the desired power

level.

7.

Synchronise node timers and start the resource monitor by clicking “Synchronise time”.

8. New queries may now be entered by clicking “Enter query”:

A new entry should be added for that task:

19

9. Return values (if any) can be viewed by double clicking on the query item.

(for the query “SELECT *”)

4.2.2 System Usage

Note: Since the GUI is not designed to work with the new extensions, the results retrieved may

not be able to be display in the Result Browser in the GUI. To observe these operations, one

currently needs to examine the debug output to see the returned table. This needs to be fixed in

any future extension as a priority.

4.2.2.1 Buffered Storage Point

In Step 8 described above in section 4.2.1, enter the query “SELECT * INTO blah EPOCH 5s

RUNCOUNT 10”. This will create a new query that is run every 5 seconds for 10 times, each

time storing all sensed values into the storage point called “blah”.

If the query “SELECT * FROM blah” is issued within 50s of the previous query, the previous

query will still be in progress and the node will return the results collected so far.

If the query “SELECT * FROM blah” is issued again after the first query had completed, it will

only return the results that had not previously been returned.

4.2.2.2 Resource Sensor

Enter the query “SELECT *” to retrieve the current measured values of all sensors. The

current battery level, percentage of CPU idle time and the percentage of free memory is

returned. Currently, the current battery level and the CPU idle time will always return 100%

due to the lack of hardware support for these features.

20

4.2.2.3 ERA Clustering

In Step 8 above, enter the query “SELECT light INTO blah2 CLUSTER 1 EPOCH 10s

RUNCOUNT 10”. This query requests the nodes to store the light sensor readings into the

clustering buffer called “blah2” every 10 seconds for 10 times. While the query is being

executed, vary the amount of light incident onto the sensor (note that readings are only taken

every 10 seconds). This allows the effect of the clustering algorithm to be observed. Please

note that variable cluster radius is not currently implemented as it was not implemented in the

original prototype. The radius field is included for future extension.

At any time, if the query “SELECT * FROM blah2” is issued, it will return the average value of

the clusters, and the number of items in each cluster at the point in time. Unlike the buffered

storage, the buffer is not reset every time the buffer is read, but will continue to aggregate.

21

5 Evaluation

5.1 Tests for Correctness

In order to verify the software correctness, tests are conducted at several levels using different

strategies.

Before tested on the actual nodes, unit tests are conducted on methods and classes to ensure

that these new additions/amendments will function as intended. These new additions /

amendments are then integrated into the SSDQP on the development platform, and unit tests

are conducted on the methods that will be using these changes. For some of these changes (for

example, changes to the parser), these unit tests needs to be stepped through manually since

the existing SSDQP code includes references to methods specific to the nodes and are not

available on the desktop, thus requiring human intervention to break from those routines.

After the code is unit tested as a whole, the new source code is then deployed to the sensor

nodes and acceptance tested. At this stage, since the code is now executing remotely and on a

reduced library, the author loses the ability to run/stop the process, and was unable to use

debuggers. Debugging of the code becomes difficult (for example, exceptions do not contain

reference to line numbers or method names when thrown).

Since the only interface that is available between any user (including the author) and the

process being tested is an outdated GUI that does not have scripting ability, only human tests

can be conducted. To conduct these tests, specially crafted queries designed to test a certain

feature are sent via the GUI, and the return values from the node is visually compared to an

expected range of values to determine the validity of the result.

The following sections explain the tests conducted for each addition/amendment as well as a

summary of the results obtained.

5.1.1 SQL Parser / Compiler

The SQL Parser and the Token Compiler sub-modules are pre-existing sub-modules developed

in SSDQP and extended in this project. The tests conducted test only for the correct operation

of the extended features.

The SQL Parser is responsible for the lexical analysis of a SQL query, and then decomposing

the query into a map of clauses. The Token Compiler is responsible for the construction of the

Token string that is sent across the radio interface based on the decomposed query.

Since these two sub-modules are never run in isolation, they were tested as a unit. In addition to

the normal JUnit tests, the two sub-modules are tested as a whole by using them to construct a

Token stream based on SQL queries that utilises the new features. The exceptions thrown are

monitored and the resultant Token string is compared to the expected values. Table 3 gives a

sample of the query strings and their expected compiler results.

It should be noted that in running these tests, operator intervention is required as the Token

Compiler requests the Sun SPOT framework to return a valid power level in order to construct

the network packet. Since unit tests bypass the Sun SPOT framework, the function call will

throw an exception. Therefore only the resultant Token string, not the actual packet is

checked for correctness. Correctness of the packet can be inferred if the Token Parser can

correctly read the packet.

22

SQL Query Token String

SELECT * FROM sensors D(M(C() E(0)))

SELECT * FROM resources D(M(C() E(1)))

SELECT BATTERY, CPU, MEMORY

D(M(C() P(E(0) 10 11 12)))

SELECT * INTO blah I(M(C() E(0)) 2)

SELECT * INTO blah CLUSTER

R(M(C() E(0)) 2)

SELECT * FROM blah D(M(C() E(2))) Table 3 Sample compiler results

The tests indicated that the extension to the grammar had been accurately implemented by

both the SQL Parser and Token Compiler.

5.1.2 Token Parser

Token Parse is responsible for creating the recursive descent execution tree from the Token

string embedded in the packet received. In addition to the JUnit tests, correct execution of the

parser is determined by using the parser to construct the execution tree for known valid Token

strings that uses the features added/changed in this project. Sub-tree containing the new/changed

features (not the whole tree) is then checked manually against the expected structure. Only the

sub-tree is verified as the correct operation of the existing code was assumed, and the new

features are implemented in a separate command object.

Tests indicated that the additional features enabled in the revised language can be constructed

into a proper execution tree.

5.1.3 Storage Point

Storage point module was comprehensively tested using JUnit on the desktop. No integration

test was possible without first deploying the code. The deployed code was tested as a part of the

integration test. SQL queries that write into and read from the storage points were used in the

final integration test.

JUnit tests indicated that the storage point is operation correctly.

5.1.4 Enhanced Resource Aware Cluster

The algorithm behind the Cluster (which was developed by another developer on a separate

project) was assumed to be correct by virtue of it being published in a conference. This project

extended the Cluster by creating an adapter for the Cluster such that it may be used as a storage

object – ie. a specialisation of the normal data table.

Only the adaptor methods have been tested, and were only tested to ensure the correct

methods and parameters have been passed. No tests have been conducted to verify the

implementation of the algorithm.

JUnit tests indicated that the adapter methods for the Cluster are operating correctly.

5.1.5 Resource Sensors

Resource sensors are new command objects that enables the “sensing” of the internal resource

level. Of the 3 sensors, 2 are dummy classes due to the fact that these resources cannot be

determined in bSPOTs (but can be determined in the newer eSPOTs). The remaining sensor,

free memory sensor cannot reliably be tested.

In order to qualitatively test the memory sensor, after the sensor had been deployed, a query

was issued to continuously add new entries to the buffer, while another query was issued to

23

retrieve the current battery level. The test showed that the memory level is consistent with the

aggressive use of memory by the other query.

5.1.6 Resource Monitor (Administrative Task)

Resource Monitor is a combination of a self-starting, recurring query and a publisher that

acknowledges the clusters whenever the resource level is updated. The self-starting nature of

the query was tested only in the final integration test. This is tested by issuing a query to

retrieve the stored resource level after the node had started up. If the query had not started, the

buffer would not have been initialised. The recurring nature was be tested by issuing the query

repeatedly in order to look for whether any change in the resource level can be detected.

The correctness of the publisher-subscriber interface was proven to be correct using code

tracing arguments. The actual operation is difficult to test due to the inability to efficiently

change the resource level of the node and the lack of clearly observable effects of the resource

adaptations.

5.1.7 Final Integration Test

The final integration test was conducted after all unit tests have been completed. It verifies the

correct operation of the system as a whole.

This test had revealed a possible bug that was not previously detected in the Clustering

prototype. It also found that the default “collection window” (time to wait for all children to

respond) may not be long enough.

5.2 Tests for Performance

In this experimental system, performance of the system is not currently a critical factor.

Therefore no explicit tests for performance have been conducted.

However, performance inefficiencies have been observed during the final integration test. It

was observed that the time required to execute a query is close to the 5 seconds collection

window. If multiple queries have been scheduled to run at times closed to each other, the

“processing” time may exceed the 5 seconds collection window at times, causing a nil return to

be interpreted by the host. It is suggested that the inefficiency may be due to the fact that all

sensors are polled for each sensing operation instead of activating only the required sensor.

5.3 Limits on the Extension

Currently the Storage Point extension is very primitive. Whenever a location is added using a

location number that is already in use, it assumes that the query wants to append the data in the

new table being added into the existing table. While this behaviour is useful for most

applications, if the type of the table (for example, clustering vs non-clustering) is different or

if the number of columns is different, it may produce unexpected results. The current

implementation is also unable to determine whether any storage location is disused and could

safely be reclaimed. This may lead to a type of “memory leak”.

The location number generator is also prone to overflowing as it has a finite length of an int

(32-bit).

The language extension, while operating correctly, produces a fair amount of useless tokens

(for example, the token for collecting children’s results when all data are buffered within the

nodes), taking up both transmission time and execution time.

24

5.4 Evaluation of the Extension

While some limits exist on the extension of the SSDQP, the extension can serve quite well as a

test bed system for the wireless sensor network research in the School of Information

Technologies. In addition to the planned features, this project also delivered a framework in

which future extensions and modifications can easily take place. The construction of the

Storage Point, and the concept of extending the basic Table to provide advanced

functionalities instead of forming a new operator for advanced features allows new data

aggregation and manipulation algorithms to be added to the SSDQP easily. However, the

current extensions may not be too useful in longer-running tests at this stage, mainly due to the

“memory leak” problem.

25

6 Individual Reflections Overall, the project provided an enjoyable yet valuable learning experience. In this project, the

author had been introduced to the researches into the active field of wireless sensor network. In

addition, the author was able to apply his knowledge gained in his undergraduate studies to

contribute into the ongoing research project. The project was completed without major

problems, but a number of issues did come up, mainly at the start and towards the end of the

project.

The author had found that the design of SSDQP is relatively extensible. The documentation

provided for the system adequately explains the process required to extend the system.

However, the documentation did not concisely describe the full implementation of a new

feature (from extending the query language, creating new network Tokens, to creating new

Tasks/TaskSets). These extensions have been documented, but they are scattered throughout

the document. In addition, there is a very steep learning curve for the developer to extend the

system. The developer must understand the full 2-stage execution process, from parsing the

query language to the final recursive descent execution of the Tasks by the scheduler.

While the author needed to interact with the Squawk Java SDK regularly, the author only

needed to call 4 commands through Ant (ant deploy, ant run, ant host-compile, ant host-run).

The author’s exposure to ant scripts greatly reduced the interactions needed as the author can

modify the standard script to minimise the action required. The author had also found that due

to the design of SSDQP, he did not need to deal with any of the Squawk API. However,

debugging software running on the nodes is found to be quite difficult, with the exception

messages only identify the class in which the exception was thrown, but not the actual method.

In terms of problems encountered, at the start of the project, miscommunication between the

author and the project supervisor delayed the actual start of the project. Initially, the author’s

understanding of the project was to extend the current SSDQP system by developing and

implementing a network-layer routing algorithm based on resource-aware clustering. Research

into that area was conducted with at least 5 paper referenced to ascertain the various

techniques used in WSN routing.

The exact nature of the task was not defined at the start of the project as the supervisor is

waiting for the code and documentation of the clustering algorithm to be submitted by the

author who was previously working on the prototype. This delayed that definition of the

project objectives.

In attempt to put the project back on schedule, specific objectives for the project were

discussed n the meeting prior to the report 1. In was then made clear that the intention of the

project was actually to extend SSDQP by implementing a resource-aware data clustering

algorithm for the efficient storage and transmission of collected data (not the formation of

network topologies). This misunderstanding required the first report to be rewritten (the

project plan needs to be overhauled), and more background research were needed as the

previous research is not too relevant to the task. However, this also gave the author the

opportunity to explore a different aspect of WSN which would otherwise not be investigated.

Towards the end of the project, it was noted that the specific requirements for the final report

and the demonstration had not been disclosed to the author, and the report is to be due with 2

weeks. This matter was raised with the project supervisor. However, 5 days before the report is

due, the report specification has yet to be disclosed. Using a guess-and-check method, the URL

for the 2003 SOFT3200 project website was discovered, with a report 3 specification sheet

26

(lecture 3) available. Since the UoS coordinator is the same and the specification for report 1 is

the same (albeit the name had changed and the dates were 3.5 years earlier), the author had

decided to write this report based on the specification for 2003.

In a follow up discussion with the supervisor, it was identified that the author had neglected to

follow up on the matter after initially raising the issue. This matter demonstrated to the author

the importance of following up on requests.

In summary, with the exception of some slight problems encountered, the programming

experience was challenging yet enjoyable. SSDQP was well designed and is quite easy to develop

in despite the steep learning curve. The Squawk Java SDK was quite simple to use but is slightly

difficult to debug in.

27

7 Conclusion In this project, the author has developed an extension to the SSDQP, a distributed query

processor for wireless sensor networks. The extension enables the system to aggregate

collected data using the Enhanced Resource Aware Clustering algorithm. In developing this

feature, several useful features have also been developed and used by the extension: the Storage

Point facility for each node, and the Resource Monitor.

The Storage Point extension provides a flexible framework for the implementation of stream

processing algorithms, using the concept of encapsulating the algorithm into the data container

instead of defining new operators. This framework enables the algorithms to operate with ad

hoc input of the data, and be able to display the “intermediate” (and instantaneous) state of the

operations. It also allows new algorithms to be easily implemented and possibly compared on

the nodes without returning the results to the base station for processing.

The mechanism to provide generic resource monitoring was also developed, enabling any

resource-aware algorithm to be implemented into SSDQP.

28

8 Future Work Since SSDQP is an ongoing experimental system, improvements to the system will constantly

be made. Some suggestions to future work include:

1. Update to the GUI to enable it to use and display the new features provided in this

project. This task should take no more than two weeks by one programmer.

2. Port the current SSDQP to the newer eSPOTs in order to take advantage of the newer

power management features. This may allow the currently unavailable features (battery

level detection and CPU utilisation) in this project to be used. This porting of the code

may take up to three weeks.

3. Optimisation to the grammar parsing in order to both reduce the overhead cost and

increase the readability of the code (current implementation had used a recursive

technique to solve an iterative problem). This task should take approximately two weeks.

4. Investigate the new ability to deploy code through the radio interface, enabled in the

eSPOTs. This task is quite open-ended, and a realistic forecast is not possible without

some initial investigation into this eSPOT capability.

5. Optimise how and when sensors are activated. The current system activates all sensors

before projecting the measurements collected into the set required whenever it needs to

sense any of the measurements, including internal measurements. A possible option is to

replace the current sense operation with a specialised buffered table, which checks the

timestamp of the existing data and determine whether to return (and buffer) a new

reading, or simply return the buffered value. This task is also quite open-ended, and a new

project may be able to be based on it, especially if this algorithm is to be made resource

aware.

6. Optimise the implementation of the ERA clustering algorithm such that an order list is

maintained in order to increase the efficiency of the insertion of new data. A possible

amendment is attached as a separate file with the corresponding mathematical proof.

7. Investigate whether the metadata on tables stored/returned from the nodes are adequate.

Currently, projection may not work as expected on buffer/clusters due to the tables not

matching the normal table, and the display of such tables are also difficult. One possible

solution would be to include an extra row of integers at the start of the result table to

denote the attributes being returned. This task may take up to three weeks for the

investigation, design and implementation of the solution.

8. Extend the Storage Point module by allowing the data stored in the node as well as the

metadata stored in the host application to be written to persistent storage. This extension

should take up to two weeks.

9. Develop and implement an algorithm to merge clusters returned from other nodes in

order to minimise network communication cost. This may require the returned table to be

identifiable as a cluster instead of a standard table in order to trigger the mechanism.

Language extensions may also be required for cases where the operator requires the

measurements to be associated to individual nodes.

This merging operation is also useful internally by the node as part of the adaptation to

reduced available memory. In some situations (like a slowly increasing value), the ERA

29

clustering algorithm may move a cluster to overlap with another cluster. While the

movement of any cluster is bounded (due to averaging), merging these clusters as a part of

the memory adaptation routine may be feasible if they overlaps by more than some

threshold amount.

10. Implement low battery (input granularity) adaptation. This adaptation described in [4],

but was unable to be implemented in the system due to the fact that the algorithm was

implemented in a data container and conceptually should not be able to alter the sampling

period of the queries that uses the container.

It is suggested that this adaptation be implemented in either the scheduler or the queries.

Implementing in the scheduler provides a global adaptation to all queries running on the

node, whereas implementing in the queries (specifically as either a separate Task or as a

parameter to the sensing Task) allows the operator to optionally disable low battery

adaptation for whatever reason.

Regardless of where this adaptation is implemented, synchronisation problems may occur

because the battery drain rate would be different across the network and the adaptation

modifies the frequency of the query. Therefore the design must be well thought through.

This adaptation could also be implemented in a special “sensor table” as described in

suggestion 5. This may be achieved by increasing the minimum sampling period. This

way, while the energy spent activating the sensor may be adapted, the adaptation does

not really change the input granularity, and would be less efficient than implementing in

the scheduler or the query. Any aggregating operators may also produce misleading results

(the effect is yet to be investigated).

30

9 References [1] Mueller R, Alonso G and Kossmann D, SwissQM: Next Generation Data Processing in Sensor

Networks, in the 3rd

Biennial Conference on Innovative Data Systems Research (CIDR), January 7-10, 2007, Asilomar, Califonia, USA

[2] Röhm U, Scholz B and Gaber M, Integration of Data Stream Clustering into a Query Processor Wireless Sensor Networks, in International Workshop on Data Intensive Sensor Networks 2007 (DISN’07), May 11, 2007, Mannheim, Germany

[3] Phung N, Gaber M and Röhm U, Resource-aware Online Data Mining in Wireless Sensor Networks, in the IEEE Symposium on Computational Intelligence and Data Mining (CIDM’07), April 1-5, 2007, Honolulu, Hawaii

[4] Gaber M and Yu P, “A Framework for Resource-aware Knowledge Discovery in Data Streams: A Holistic Approach with Its Application to Clustering”, in Proceedings of the ACM SAC ’06, Dijon, France : ACM Press, 206

[5] Scholz B, Gaber M, Dawborn T, Khoury R, and Tse E, Efficient time triggered query processing in wireless sensor networks, to appear in: Proceedings of the 2007 International Conference on Embedded Systems and Software (ICESS-07) to be held in Daegu, Korea, May 14-16. Springer Press, 2007.

[6] Röhm U, Sun SPOT Distributed Query Processor (SSDQP) website, http://www.cs.usyd.edu.au/~roehm/projects/SSDQP.html, last accessed 24 May 07

[7] Dawborn T, Khoury R and Tse E, SSDQP: Sun SPOT Distributed Query Processor Software Development Guide (distributed with SSDQP)

[8] Dawborn T, Khoury R and Tse E, SSDQP: Sun SPOT Distributed Query Processor System Design (distributed with SSDQP)

[9] Younis O and Fahmy S, “HEED: a hybrid, energy efficient, distributed clustering approach for ad hoc sensor networks” in Proceedings of the ACM SAC ’06. Dijon, France: ACM Press, 2006.

[10] Teng W-G, Chen M-S and Yu P S, “Resource-aware mining with variable granularities in data streams”, in SIAM SDM 2004, 2004

[11] Röhm U, Adaptive In-Network Query Processing for Data-Intensive Sensor Networks, presented in “Data From The Field” Workshop, University of Sydney, 24 May 2007

[12] Madden S, Franklin M, Hellerstein J and Hong W, The Design of an Acquisitional Query Processor for Sensor Networks, In the Proceedings of SIGMOD 2003, June 9-12, San Diego, USA

[13] Sun Microsystems, Inc., Project Sun SPOT Products website, http://www.sunspotworld.com/products/, last accessed 24 May 07

Appendix A. Mathematical proof for a proposed optimisation (Attached as PDF)