1
II. Conditional Querying: Current ARM data searches are performed by using its metadata. NoSQL can improve the data searching capabilities by including measurement values. Data: ARMBEATM 2 provides best estimate of selected atmospheric state profiles and surface quantities averaged every hour. I. LASSO: Lays the groundwork to generate regular LES modeling at the ARM Southern Great Plains (SGP). Result: library of simulations A “Data Bundle” for LASSO combines ARM observations and high-resolution model output to provide a highly detailed description of the atmosphere. 1. Objective 2. Introduction 3. Workflow 5. Visualization 6. Data Retrieval , Processing, and Storage 5.1 Interactive Visualizations for LASSO and ARM Observational Data Cassandra Spark NoSQL Database Distributed Environment High Availability Elastic Scalability Column Store Easy to Use Good Documentation Processing Framework Distributed Environment In-Memory Processing Master -Worker Architecture Cassandra Compatible Written in Scala Good Documentation Production Cluster Details Number of nodes - 5 RAM: 256GB CPU Cores: 32 Storage: 2 nodes -> 3TB SSD 3 nodes -> 3TB HDD Number of nodes - 5 (1 master and 4 workers) RAM: 256GB CPU Cores: 32 Storage: 2 nodes -> 3TB SSD 3 nodes -> 3TB HDD 1. https://www.arm.gov/capabilities/modeling/lasso/ 2. http://www.arm.gov/data/vaps/armbe/armbeatm 3. https://github.com/mbostock/d3/wiki/Tutorials 7. “We’d like to hear from you….” Large-Scale Data Analysis and Visualization for ARM Using NoSQL Technologies Bhargavi Krishna µ , Kyle Dumas µ , William Gustafson , Andrew Vogelmann , Tami Toto , and Giri Prakash µ μ Oak Ridge National Laboratory Pacific Northwest National Laboratory Brookhaven National Laboratory 5.3 Interactive Visualizations Using RADAR Data Histogram Visualization shows histograms for different variables from armbe data based on cloud types. The cloud types values were provide by Laura Riihimaki in the same time sample as ARMBE data. 4. Technology 8. References Bhargavi Krishna, Ph.D. Email: [email protected] Phone: 865-574-8264 Parallel Coordinates Each measurement/dimension is a coordinate which allows for selecting ranges and are interchangeable. Provide near real-time analytics and visualizations for ARM data such as from LES ARM Symbiotic Simulation and Observation (LASSO 1 ), radar, and best estimate value added products Giri Prakash Email: [email protected] Phone: 865-241-5926 http://archive.arm.gov/lassobrowser ARMBEATM 2 datastream generated use cases. Spark Scala Application generated outputs for conditional querying and statistics. Data are retrieved dynamically based on user selection. Conditional querying: Use Case : The table below shows the days in which surface temperature was less than 0 o C/273.15 K in 2012 at SGP Step 1:Data Retrieval & Storage Data Loader Node.js D3.js Spark (Scala Application) Browser PostgreSQL- Metadata Inputs netCDF Stores Retrieves Stores Sends data Retrieves Plots User Cassandra Conditional querying Step 2: Data Processing Step 3: Visualization Retrieves: Raw data Stores: Processed/Statistical data The two figures below represent a sample statistical summary of radar data as a map and multiline time series plot. The data was provided by Scott Collis and Jonathan Helmus. Cloud Type: Low Clouds Year: 2010 Pressure (kPa) 5.2 Interactive Visualizations for ARMBE data

Large-Scale Data Analysis and Visualization for ARM Using ...€¦ · 5.1 Interactive Visualizations for LASSO and ARM Observational Data Cassandra Spark NoSQL Database Distributed

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Large-Scale Data Analysis and Visualization for ARM Using ...€¦ · 5.1 Interactive Visualizations for LASSO and ARM Observational Data Cassandra Spark NoSQL Database Distributed

II. Conditional Querying: • Current ARM data searches are performed by using its metadata. NoSQL can

improve the data searching capabilities by including measurement values. • Data: ARMBEATM2 provides best estimate of selected atmospheric state profiles

and surface quantities averaged every hour.

I. LASSO:• Lays the groundwork to generate regular LES modeling at the ARM Southern

Great Plains (SGP). • Result: library of simulations • A “Data Bundle” for LASSO combines ARM observations and high-resolution

model output to provide a highly detailed description of the atmosphere.

1. Objective

2. Introduction

3. Workflow

5. Visualization

6. Data Retrieval , Processing, and Storage

5.1 Interactive Visualizations for LASSO and ARM Observational Data

Cassandra SparkNoSQL DatabaseDistributed EnvironmentHigh AvailabilityElastic ScalabilityColumn StoreEasy to Use Good Documentation

Processing FrameworkDistributed EnvironmentIn-Memory ProcessingMaster-Worker ArchitectureCassandra CompatibleWritten in ScalaGood Documentation

Production Cluster Details

Number of nodes - 5RAM: 256GBCPU Cores: 32Storage: 2 nodes -> 3TB SSD3 nodes -> 3TB HDD

Number of nodes - 5 (1 master and 4 workers)RAM: 256GBCPU Cores: 32Storage: 2 nodes -> 3TB SSD3 nodes -> 3TB HDD

1. https://www.arm.gov/capabilities/modeling/lasso/

2. http://www.arm.gov/data/vaps/armbe/armbeatm

3. https://github.com/mbostock/d3/wiki/Tutorials

7. “We’d like to hear from you….”

Large-Scale Data Analysis and Visualization for ARM Using NoSQL Technologies Bhargavi Krishnaµ, Kyle Dumasµ, William Gustafson⍵, Andrew Vogelmann⍴, Tami Toto⍴, and Giri Prakashµ

µOak Ridge National Laboratory ⍵Pacific Northwest National Laboratory ⍴Brookhaven National Laboratory

5.3 Interactive Visualizations Using RADAR Data

Histogram Visualization shows histograms for different variables from armbe data based on cloud types.

The cloud types values were provide by Laura Riihimaki in the same time sample as ARMBE data.

4. Technology

8. References

Bhargavi Krishna, Ph.D.

Email: [email protected]

Phone: 865-574-8264

Parallel CoordinatesEach measurement/dimension is a coordinate which allows for selecting ranges and are interchangeable.

Provide near real-time analytics and visualizations for ARM data such as from LES ARM Symbiotic Simulation and Observation (LASSO1), radar, and best estimate value added products

Giri Prakash

Email: [email protected]

Phone: 865-241-5926

http://archive.arm.gov/lassobrowser

• ARMBEATM2

datastream generated use cases. Spark Scala Application generated outputs for conditional querying and statistics.

• Data are retrieved dynamically based on user selection.

Conditional querying:Use Case : The table below shows the days in which surface temperature was less than 0 oC/273.15 K in 2012 at SGP

Step 1:Data Retrieval & Storage

Data LoaderNode.js

D3.js

Spark (Scala Application)

Browser

PostgreSQL-Metadata

Inputs netCDF

Stor

es

Ret

rieve

s

Stor

es

Send

s da

ta

Ret

riev

esPl

ots

User

Cassandra

Con

ditio

nal

quer

ying

Step 2: Data Processing Step 3: Visualization

Retrieves: Raw dataStores: Processed/Statistical data

The two figures below represent a sample statistical summary of radar data as a map and multiline time series plot. The data was provided by Scott Collis and Jonathan Helmus.

Cloud Type: Low CloudsYear: 2010

Pressure (kPa)

5.2 Interactive Visualizations for ARMBE data