Upload
others
View
6
Download
0
Embed Size (px)
Citation preview
Data management and data analysisfor offshore windparks
Ralf Herrmann1,Jens Rabe2
1LUH, Institute of Concrete Construction (IfMa)2FhG, Fraunhofer-Institute for Wind Energy and Energy System Technology (IWES)
7. GIGAWIND Symposium, 2 March 2017, Leibniz Universität Hannover
Slide 2Ralf Herrmann - Data management and data analysis for offshore windparks
Overview
Introduction
Is Structural Heath Monitoring a Big Data problem?
Tasks for data processing
Distributed data processing
Data management and -analysis in Gigawind life
Conclusions
Slide 3Ralf Herrmann - Data management and data analysis for offshore windparks
Introduction
alpha ventus
two extensively equipped wind turbines with sensors
Senvion 5M (AV4) and Adwen Multibrid M5000 (AV7)
producing energy day by day and important data
Foto: Martina Nolte / Lizenz: Creative Commons CC-by-sa-3.0 de
Slide 4Ralf Herrmann - Data management and data analysis for offshore windparks
How to handle all the data ?
measurements started in April 2010
approx. 5 billion values every day
6 TByte each year
data is basis for lifecycle analysis
data is accessible on the Internet on the WEDA Platform for download
Foto: [Dur2016]
Foto: [Dur2016]
AV7 AV7
Slide 5Ralf Herrmann - Data management and data analysis for offshore windparks
How to handle all the data ?
Insights of the architects of the WEDA system [GUD 2013]
“When the data is there, researchers want it all”
“Online analysis […] requires a distributed solutions”
“Data quality is an issue”
“constant tuning and optimization as the data volumes grows”
Slide 6Ralf Herrmann - Data management and data analysis for offshore windparks
Characteristics of a Big Data System
Visibility
Access from various or multiple locations for multiple users
VolumeLarge amounts of
record and file
Petabytes
Terabytes
Gigabytes
VelocityStreams of data, fast availability for access and
delivery, high sampling rates
in real time
per second
per minute
Variety
Analysis
depends on the data format
• structured Data
• semi- structured Data
• unstructured Data
Value/ Veracity
Generates additional value for multiple users
Development of reliable systems for higher
precision and improvement of the quality of
data analysis
Big
Data
[NI2014], [Wrobel2012], [Fas2016]
Slide 7Ralf Herrmann - Data management and data analysis for offshore windparks
Is Structural Heath Monitoring a Big Data problem?
Volume: considered volume is high (>42 TByte), growing 6 TByteeach year from 1.500 sensors ( 33 GByte per sensor ) [Gud2013]
Velocity: results are needed as fast as possible, high sampling rates
Variety: data exists in comma-separated value (csv) files, webcam pictures and data from other sources (weather)
Value: prepared data is valuable for research, industry, standards associations, certification institutes, data resellers
Visibility: data access is needed from interested parties all over the world
Slide 8Ralf Herrmann - Data management and data analysis for offshore windparks
Looking for the right system
Foto: [Mer2014]
Slide 9Ralf Herrmann - Data management and data analysis for offshore windparks
Transactional and Analytical data processing
OLTP (online transactional processing)
focus on data operations (insert, copy, update, delete)
quick processing
simple algorithms, query language
user interactions with data
transaction safety
OLAP (online analytical processing)
focus on analytical calculations
usage of usually existing analytical algorithms
complex queries
historical and archive data
batch-operations
Slide 10Ralf Herrmann - Data management and data analysis for offshore windparks
What kind of work is done?
development of algorithms
inspection of the data
alarm and event based notifications
well defined access restrictions on the data
data manipulation for data cleansing by hand
evaluation of big amounts of data
statistical calculations
complex algorithms e.g. rainflow-counting
automated data validation check
Transactional behavior Analytical behavior
Slide 11Ralf Herrmann - Data management and data analysis for offshore windparks
Example
OLTP (online transactional processing)
f( )= 27
find the right piece quickly …
… to put it in the own algorithm/software
OLAP (online analytical processing)
calculate a big amount of data
… e.g. empirical cumulative distribution functions
Slide 12
Transactional and Analytical data processing
design goals for data systems aims either transactional or analytical
good OLAP performance degrades OLTP performance and vice versa
same problem exits with traditional Big Data databases or data systems
Apache Cassandra, MySQL Cluster
-> transactional data processing
Apache Hadoop
-> analytical data processing
Average response time [s] for :
1 – mainly transactional queries in parallel
2 – mainly analytical queries in parallelsource: [Bog2011]
0
2
4
6
8
10
12
14
16
18
20
1 2
OLTP
OLAP
[s]
Ralf Herrmann - Data management and data analysis for offshore windparks
Slide 13Ralf Herrmann - Data management and data analysis for offshore windparks
Where is the data processed?
our office computers are not sufficient anymore
too less HDD memory, main memory, processing power
distributed processing systems:
most spread and used open-source framework:
Apache Hadoop
can work as a filesystem or database
Foto: Apache Software Foundation
Slide 14Ralf Herrmann - Data management and data analysis for offshore windparks
Distributed storage of data
a cluster consists of many computer machines (nodes)
Hadoop distributed files system (HDFS)
data is sliced in parts
NameNode: knows locationsof files in the file system
DataNode: stores a part of afile
Client NameNode
DataNodeDataNode DataNode
1) request file
2) list of relevant DataNodes
3) read data parts
…
Hadoop4) replicates data
Slide 15Ralf Herrmann - Data management and data analysis for offshore windparks
Distributed processing of data
For analysis: Data in not send to the client, but the algorithm is send to the cluster
Master Node: distributes the algorithm to all nodes
Computational Node: executesthe algorithm together
Algorithm need to followa programming model
Client Master Node
Computational
Node
Computational
Node
Computational
Node
1) sends analytics task
2) assigns tasks
3) can interchange map results4) results are stored inside the HDFS Hadoop
Slide 16Ralf Herrmann - Data management and data analysis for offshore windparks
MapReduce
programming technique for analyzing data sets that do not fit in memory
algorithms need to be programmed to run in parallel
difficult task, including parallelization, fault tolerance, data distribution and load balancing
simplified by programmingmodel: MapReduce
k1
k3
k2
k1
k1
Input
data
partmap reduce
parted data by property
going to other reducerscoming from other mappers
merge & reduce
results
Slide 17Ralf Herrmann - Data management and data analysis for offshore windparks
Data management and data analysis in Gigawind life
LAMA-analytical operations-here the big data happens
I/O-Logic fortimeseries data
User-Interface/Data Import
Distributed Database
Data andEvaluation
Management
transactionaldatabase
SMMEXS-transactional operations
Slide 18Ralf Herrmann - Data management and data analysis for offshore windparks
Key Features of SMMEXS
management of sensors and evaluations
provides relevant information on the sensor data
user interface for configuration
automated data import
runs periodical analysis with now incoming data
prepared algorithms for civil engineering related analytics
works on measurement data, images, audio files
Slide 19Ralf Herrmann - Data management and data analysis for offshore windparks
Key Features of LAMA
redundant, fault-tolerant data storage
processing algorithms can be run directly on the cluster
no data upload / download necessary besides parameters and results
tasks are run on the nodes which hold the data locally –minimizing network traffic
Java and .net based interfaces
can be used with LabVIEW, MATLAB and more
2x to 10x faster than running locally, depending on type of algorithm
Slide 20Ralf Herrmann - Data management and data analysis for offshore windparks
Performance of the LAMA data storage
performance on a 50GiB sample
0
5
10
15
20
25
30
35
40
45
50
55
MATLAB LAMA, usingdatabase
LAMA, usingfiles
Co
mp
uta
tio
nti
me
in m
inu
tes
low
er
isb
ette
r
0
10
20
30
40
50
60
70
80
90
100
MATLAB LAMA, usingdatabase
LAMA, usingfiles
Co
mp
uta
tio
nti
me
in m
inu
tes
low
er
isb
ette
r
Parallelizable Task10 minute average
Non-Parallelizable TaskCycle Counting
Slide 21Ralf Herrmann - Data management and data analysis for offshore windparks
Results from the system in Gigawind life
• data synchronized • data cleaned
• data filtered • data statistical evaluated
Slide 22Ralf Herrmann - Data management and data analysis for offshore windparks
Conclusions
amount of data will grow, so algorithms are send to the data
development of algorithms with parallelism is an essential task
you can’t get your local copy
many research areas use Big Data with great success (medicine, economy, BIM, predictive maintenance)
Big Data analytics is notably impacting the Civil Engineering domain [Alavi2017]
current Civil Engineering information systems are still lacking in successful implementation [Alavi2017]
Slide 23Ralf Herrmann - Data management and data analysis for offshore windparks
Thank you very much for your attention
In detail see the poster at the poster session:
Fast and scalable distributed storage system for measuring and simulation data
Slide 24Ralf Herrmann - Data management and data analysis for offshore windparks
References
[Bog2011] A. Bog, K. Sachs, and A. Zeier, “Benchmarking Database Design for Mixed OLTP and OLAP Workloads,” in ICPE’11: proceedings of the 2nd AMC/SPEC International Conference on Performance Engineering, March 14-16, 2011, Karlsruhe, Germany, New York: ACM, 2011.
[Gud2013] Gudenkauf, S., and A. Claassen. “Data Warehousing for Distributed Offshore Research at Alpha Ventus - Overview and Insights Gained.” In Proceedings of the 27th EnviroInfo 2013 Conference. Hamburg: Shaker-Verlag, 2013.
[Mer2014] R. Merino, „Trafodion: How to use Hadoop for operational and transactional purposes“ URL: https://de.slideshare.net/
BigDataSpain/rodrigo-merino-how-to-use-hadoop-for-operational-and-transactional-purposes-big-data-spain-2014 (28 Feburary 2017)
[Alavi2017] Alavi, Amir H., and Amir H. Gandomi. “Big Data in Civil Engineering.” Automation in Construction, January 2017. doi:10.1016/j.autcon.2016.12.008.
[NI2014] National Instruments: “Big Analog Data™ Solutions ” URL: www.ni.com/pdf/info/us/big_analog_data_presentation.pdf (28.02.2017)
[Wrobel2012] Wrobel, S.: „Big Data – Vorsprung durch Wissen Chancen erkennen und nutzen“, 2012 URL: https://www.iais.fraunhofer.de/content/dam/iais/gf/bda/Downloads/FraunhoferIAIS_Big-Data_2012-12-10.pdf (28 February2017)
[Fas2016] Fasel, D., & Meier, A. (2016). Big Data: Grundlagen, Systeme und Nutzungspotenziale. Wiesbaden: Springer Vieweg.
[Dur2016] Durstewitz, M. and Lange, B.; “Meer-Wind-Strom: Forschung am ersten deutschen Offshore-Windpark”, 1. Auflage, Springer Fachmedien Verlag, Wiesbaden, 2016