Integrácia a spracovanie údajov o životnom prostredí Technológia ADMIRE Ondrej Habala Seminár CRISIS, 18.10.2011 ITMS 26240220060

Embed Size (px)

DESCRIPTION

ITMS projekt ADMIRE Architecture: Separation of Concerns

Citation preview

Integrcia a spracovanie dajov o ivotnom prostred Technolgia ADMIRE Ondrej Habala Seminr CRISIS, ITMS ITMS projekt Goals Accelerate access to and increase the benefits from data exploitation; Deliver consistent and easy to use technology for extracting information and knowledge; Cope with complexity, distribution, change and heterogeneity of services, data, and processes, through abstract view of data mining and integration; and Provide power to users and developers of data mining and integration processes. ITMS projekt ADMIRE Architecture: Separation of Concerns ITMS projekt ADMIRE Architecture ITMS projekt ADMIREs High-Level Architecture ITMS projekt ADMIRE Gateways USMT ITMS projekt DISPEL Data Intensive Systems Process-Engineering Language Data-intensive distributed systems Connection point of complex application requests and complex enactment systems Benefit: method development, engineering and evolution of supported practices can take place independently in each world Describes enactment requests for streaming-data workflows processes Process-engineering time transform and optimize process in preparation for enactment period ITMS projekt DISPEL: Simple Example Creating connections String sql1 = "SELECT * FROM some_table"; String sql2 = SELECT * FROM table2; String resource = " "; SQLQuery query = new SQLQuery; |- sql1, sql2 -| => query.expression; |- resource -| => query.resource; Tee tee = new Tee; query.result => tee.connectInput; Creating streams of literals ITMS projekt DISPEL real use ITMS projekt APLIKAN TRIE NASADENIE TECHNOLGIE ADMIRE V IVOTNOM PROSTRED ITMS projekt Flood Application Data sets used in hydrological scenarios FSKD 2010Yantai, China, August DatasetDomainDescriptionVolume Temporal coverage Spatial coverage HUSAVHydrologyData from two probes, containing water saturation of soil 10s of MB Two distinct points MARSMeteorologyHistorical meteorological data (temperature, rainfall, etc) for Slovakia 100s of MB Slovakia (grid 50x50 km) SVPHydrologyData from waterworks in western Slovakia (mainly river Vh) outflows, water levels, temperature, rainfall 100s of MB distinct waterworks DAISYPedologyVarious pedological parameters for one probe in southern Slovakia 10s of MB One point WOFOSTPedologyCrop data (with attached soil and meteorological data) for Slovakia, year s of MB2006Slovakia (grid) SHMU_CURRMeteorologyOn-line database of meteorological data copied from SHMI web; including radar imagery 10s of GB Slovakia (about 100 distinct probes) SHMU_HISTMeteorologyHistorical meteorological data from SHMI probes 100s of MB Slovakia (more than 100 distinct probes) SHMU_GRIBMeteorologyHistorical temperatures and rainfall amounts in a gridded binary format 100s of GB Slovakia (grid, various sizes) RADARMeteorologyWeather radar imagery100s of GB Slovakia SHMU_HYDROHydrologyHistorical data from hydrological measurement stations 10s of MB Orava and upper Vah river SOIL_RETPedologyWater retention capacities of soil10s of MBcurrent (no time series applicable) Vah river watershed area ITMS projekt Orava scenario Legend Green area Orava (part of north Slovakia) Blue Orava reservoir and local rivers Red dots hydrological measurement stations Notes We are interested only on hydrological stations below the Orava reservoir In our tests we will use the hydrological station 5830 (Tvrdosin) ITMS projekt ORAVA data mining concept Predictors rainfall amount (reservoir and station), air temperature (reservoir and station), reservoir discharge, reservoir temperature Time Water temp Orava Rainf all Orava Air temp Orava Air temp Station RainFall Station Outflow Orava Water - level Station Water temp Station T-4E-4R-4A-4B-4S-4D-4X-4Y-4 T-3E-3R-3A-3B-3S-3D-3X-3Y-3 T-2E-2R-2A-2B-2S-2D-2X-2Y-2 T-1E-1R-1A-1B-1S-1D-1X-1Y-1 TERABSDXY T+1R+1A+1B+1S+1D+1X+1Y+1 T+2R+2A+2B+2S+2D+2X+2Y+2 T+3R+3A+3B+3S+3D+3X+3Y+3 T+4R+4A+4B+4S+4D+4X+4Y+4 T+5R+5A+5B+5S+5D+5X+5Y+5 T+6R+6A+6B+6S+6D+6X+6Y+6 Targets water level and temperature at a station below the reservoir Predicted by a meteo model Given in a schedule Targets of data mining ITMS projekt ORAVA data integration Integration of data from GRIB files Reservoirs Inputs Time period of experiment Reservoir ID List of hydro stations Geo coordinates ITMS projekt ORAVA data sets DatasetDomainDescriptionVolume Temporal coverage Spatial coverage SVPHydrologyData from waterworks in western Slovakia (mainly river Vh) outflows, water levels, temperature, rainfall 100s of MB distinct waterworks SHMU_CURRMeteorologyOn-line database of meteorological data copied from SHMI web; including radar imagery 10s of GB Slovakia (about 100 distinct probes) SHMU_HISTMeteorologyHistorical meteorological data from SHMI probes 100s of MB Slovakia (more than 100 distinct probes) SHMU_GRIBMeteorologyHistorical temperatures and rainfall amounts in a gridded binary format 100s of GB Slovakia (grid, various sizes) SHMU_HYDR O HydrologyHistorical data from hydrological measurement stations 10s of MB Orava and upper Vah river ITMS projekt Integrated raw data ORAVA Scenario Integrated and preprocessed data Water_temp [24 hours] Orava Air_temp Orava Rainf all Orava Outflow Orava Rainfall Station Air_temp Station Flow/Height Station Water_temp Station E E E E E E E Water_temp Orava Air_temp Orava Rainfall Orava Outflow Orava Rainfall Station Air_temp Station Flow/Height Station Water_temp Station Integrated preprocessed data Time [hours] ITMS projekt Properties \ ModelLinear regression Multilayer perceptron Correlation coefficient Mean absolute error Root mean squared error Relative absolute error % % Root relative squared error % % Total Number of Instances 8760 Orava Scenario Water temperature prediction ITMS projekt Properties \ ModelMultilayer perceptron Correlation coefficient Mean absolute error Root mean squared error Relative absolute error % Root relative squared error % Total Number of Instances8735 Orava Scenario Orava Scenario Water level prediction ITMS projekt Orava Scenario Data integration workflow ITMS projekt Orava Scenario Training workflow ITMS projekt Orava Scenario Prediction workflow ITMS projekt Needed to write custom activities for certain data extraction tasks Data integration was the most complex part of the scenario in terms of workflow design Data integration was quite easy to write and modify in DISPEL once we had all the PEs in place Used composite PE to extract different types of quantities from meteorological GRIB files Implementation Notes ITMS projekt ADMIRE Architecture: Separation of Concerns ITMS projekt Orava Scenario Portal ITMS projekt Orava Scenario Portal Radar Scenario Very short-term rainfall prediction from weather radar data ITMS projekt Radar Scenario Radar Scenario Description Network of synoptic stations in Slovakia 27 stations in Slovakia Used data from years 2007 and 2008 Available variables: rainfall, humidity, Radar reflexivity, atmospheric pressure and temperature values for each hour Very short-term rainfall prediction from weather radar data Movement of areas with higher air moisture content, and thus also higher precipitation potential ITMS projekt TimeWind Radar reflexivity Rainfall Orava T-2W-2D-2F-2 T-1W-1D-1F-1 TWDF T+1W+1D+1F+1 T+2W+2D+2F+2 Radar Scenario Main predictors and target variables Overview of the main predictors and target variables in the Radar scenario. The green cells are predicted from meteo-model. Blue cells are from model, based on motions vectors. Yellow cells are final target of data mining. ITMS projekt Isotonic regression model 10-fold Cross Validation Hydro-meteorological performance Radar Scenario Atributes of model Numerical characteristicValue Correlation coefficient Mean absolute error Root mean squared error Total number of instances Attribute \ Threshold0.3 mm0.6 mm Probability of detection Miss Rate Hans-Kuiper True skill score Proportion of correct ITMS projekt Other tested models Neural networks, SMOreg, linear regression,... Reached correlation coeficient between 0,35 and 0,42 Validation - 10 Cross Fold Problems in model creation : process is significantly stochastic Some input variables/parameters (humidity) are backwards dependent on output rainfall. Meteorological process is very sensitive Reflection matrix represents quantity of water in atmosphere, not exact rainfall rate in specified area, as opposed to data from synoptic stations RADAR model ITMS projekt Radar Scenario Training Forecast ITMS projekt Radar Scenario Radar Scenario M otion vector computation SVP Scenario Forecast of reservoir inflow based on temperature, precipitation and snow cover ITMS projekt SVP Scenario SVP Scenario Structure of data Time Air Temperature Rainfall Orava Snow_prevSnowInflow_prevInflow t-1E(t-1)R(t-1) S(t-1) F(t-1) tE(t)R(t)P(t)S(t)I(t)F(t) t+1E(t+1)R(t+1)P(t+1)S(t+1)I(t+1)F(t+1) t+2E(t+2)R(t+2)P(t+2)S(t+2)I(t+2)F(t+2) t+3E(t+3)R(t+3)P(t+3)S(t+3)I(t+3)F(t+3) t+4E(t+4)R(t+4)P(t+4)S(t+4)I(t+4)F(t+4) 1.P(t) = S(t-1) I(t) = F(t-1) 2.S(t) = f(P(t), R(t), E(t)) F(t) = h(I(t), S(t), E(t), R(t)) Two steps of prediction : 1.Copy previous values of snow quantity and inflow volume. 2.Apply trained models (snow model at first, and then inflow model). ITMS projekt Fold Cross Validation, 8760 records; models for inflow prediction N-Fold Cross Validation, 8760 records; Decision Tree Model M5P SVP Scenario SVP Scenario Models & Attributes Properties \ Model Perceptron Neural Network Gaussian Process Linear Regression Decision Tree M5P Correlation coefficient Mean absolute error Root mean squared error Relative absolute error % % % % Root relative squared error % % % % Properties \ N-Fold N = 10N = 20N = 25N = 50N = 100 Correlation coefficient Mean absolute error Root mean squared error Relative absolute error % % % % % Root relative squared error % % % % % ITMS projekt SVP Scenario SVP Scenario Data Integration workflow ITMS projekt SVP Scenario Model training workflow ITMS projekt SVP Scenario Forecast workflow ITMS projekt ADMIRE Tools Registry client GUI Process designer SKSA Gateway Process Manager DMI Model Visualizer ITMS projekt Registry client GUI Read-only access to ADMIRE Registry list PEs and view their properties search, sort PEs Write access to Registry is done via DISPEL documents ITMS projekt Process Designer Manage your DMI project (files, directories project structure) Edit your DMI process graphically View the canonical (DISPEL) representation of your DMI process in real time Select elements from the Registry View the properties of your chosen elements ITMS projekt Semantic Knowledge Sharing Assistant Context the user works in Several reservoirs, one settlement Knowledge that may be useful in this context previously entered by other users Provides access to existing users knowledge, sorting and selecting it automatically according to the users current working context ITMS projekt Gateway Process Manager Keep track of running processes stop/pause/cancel the process view the process source DISPEL access process results (if available) in several ways raw or visualized ITMS projekt DMI Model Visualizer For data mining experts Visualization of data mining models Read Weka classifier object produce PMML description of the model Show the PMML as a graphical tree ITMS projekt Custom Application Portal for end-users (domain experts) ITMS projekt Vaka za pozornos