Upload
isabel-norris
View
228
Download
4
Tags:
Embed Size (px)
Citation preview
Energy Issues in Data Analytics
Domenico TaliaCarmela ComitoUniversità della Calabria & [email protected]
Motivations for Taking Care of Data
Data is everywhere (Big, complex, real-time, unstructured)
Putting data at the center of research work on energy issues
may bring some benefits. (Today the focus is on algorithms).
Cost metrics of data management techniques
(communication, storing, access, query, analysis) will help
professionals and users to save energy in data-intensive
apps.
Energy-scalable data management is important for
sustainable data science.
2
Data Availability or Data Deluge?
• Every life process today is data intensive.
• The information stored in digital data archives is enormous and its size is still growing very rapidly.
3
Data Availability or Data Deluge?
• Some decades ago the main problem was the shortage of information, now the challenge is
• the very large volume of information to deal with and
• the associated complexity to process it and to extract significant and useful parts or summaries.
4
Complex Big Problems
…
• Bigger and more complex
problems must be solved
by using large-scale distributed
computing systems.
• DATA SOURCES are
larger and larger and ubiquitous
(Web, sensor networks, mobile
devices, telescopes, …).
5
…andBig Data• Even where accessible, much
data in many fields cannot be read by humans
so
• The huge amount of data available today requires smart data analysys techniques to aid people to deal with it
and
• Scalable algorithms, techniques, and systems are needed (time and energy scalability).
6
Data: From Storing to Analysis
• Storing data is not the only main problem.
• A key issue is analyse, mine, and process data for making it useful.
Source: The Economist
7
Towards Models for Energy-aware Data Management
The main focus today is on energy-aware
algorithms, tasks, applications.
The other side of the coin is data and costs of
operating on it.
Abstract energy-cost models for exchanging, accessing
and transform data are primary elements for energy-
aware data management at large scale.
They are useful for sustainable data science.
8
An Example:Energy-aware Mining of Data
We evaluated the energy cost of analyzing data by using some well-known data mining techniques on mobile devices.
Our interest was mainly on how the same technique consumes energy when dimension of data change.
Tests with different
• Data set dimensions,
• Attribute number,
• Class number.
9
Data Mining Techniques Energy characterization of data mining techniques running on
mobile devices k-means (data clustering) J48 (data classification) Apriori (association rules)
Common performance parameters Number of instances (data set size) Number of attributes
Algorithm-specific performance parameters k-means: number of clusters J48: decision tree size Apriori: Number of rules, minimum support and minimum
confidence
10
k-means (1) 11
Increasing the number of instances,with different produced clusters
k-means (2) 12
Increasing the number of attributes with different produced clusters
Apriori (1) 13
Increasing the number of instances with different number of attributes
Apriori (2) 14
Increasing the data set size with different number of rules
Apriori (3)15
Increasing the data set size with different minimum confidence
J48 16
Increasing the number of instances with different number of attributes
Results on different devices
Results obtained with different smart phones Sony Xperia P: 1 GHz Dual CoreARM processor and 1 GB RAM HTC Hero: 528 MHz Qualcomm processor and 288 MB RAM
17
Results on different devices18
Results obtained with different smart phones Sony Xperia P: 1 GHz Dual CoreARM processor and 1 GB RAM HTC Hero: 528 MHz Qualcomm processor and 288 MB
RAM
Results on different devicesResults obtained with different smart phones
Sony Xperia P: 1 GHz Dual Core ARM processor and 1 GB RAM HTC Hero: 528 MHz Qualcomm processor and 288 MB RAM Samsung Galaxy ACE: 800 MHz Qualcomm processor and 512 MB RAM
19
Concluding Remarks
Data-intensive applications demands for energy cost models
based on data characteristics.
This should be done for sensors, smart phones, HPC servers,
and clouds. In general, for large scale computing systems.
Sustainible data center services and applications may benefit
from these models.
Preliminary experiments show useful data.
20
Data Sets Census (http://archive.ics.uci.edu/ml/datasets/Census+Income)
Used with K-means Data set size: 14 MB Number of instances: 244348 Number of attributes: 11
Census_disc (http://archive.ics.uci.edu/ml/datasets/Census+Income) Used with Apriori Data set size: 19 MB Number of instances: 333011 Number of attributes: 11
Covertype (http://archive.ics.uci.edu/ml/datasets/Covertype) Used with J48 Data set size: 14.5 MB Number of instances: 114556 Number of attributes: 55
21
22
Method Algorithm Data Set
Size
RAM Memory (MByte)
Virtual Memory (MByte)
CPU (%)
Battery Charge
Depletion (mAh)
Energy Consumption
(J)
Time (sec)
Association Rules
Rule Induction
Apriori
CENSUS_DISC.arff
0,1 MB 15,86 95,19 96,92 0 0 6
0,2 MB 16,97 105,36 98,03 0 0 12
0,4 MB 18,06 104,95 98,24 0 0 26
0,8 MB 19,87 102,75 98,13 2,7 35,964 73
1,6 MB 23,32 103,99 96,87 13,5 179,82 300
3,2 MB 26,92 100,01 95,44 23,3 310,356 3960
6,4 MB --- --- --- --- --- ---
Classification
Trees J48
COVERTYPE.arff
0,1 MB 19,47 104,94 96,23 13,4 178,488 300
0,2 MB 20,15 104,92 98,21 29,8 396,936 540
0,4 MB 23,87 105,6 97,43 59,4 791,208 2040
0,8 MB 27,68 103,87 97,36 194,64 2592,6048 8160
1,6 MB --- --- --- --- --- ---
3,2 MB --- --- --- --- --- ---
6,4 MB --- --- --- --- --- ---
Clustering
Instance-based/La
zy Learning
K-Means
CENSUS.arff
0,1 MB 16,73 96,56 98,03 6,75 89,91 55
0,2 MB 17,95 102,05 97,65 8,1 107,892 150
0,4 MB 19,72 102,16 97,02 18,9 251,748 300
0,8 MB 23,08 101,86 97,97 18,9 251,748 600
1,6 MB 26,4 95,96 97,82 43,2 575,424 1320
3,2 MB --- --- --- --- --- ---
6,4 MB --- --- --- --- --- ---