21
Zhenyu (Henry) Huang, Ph.D., P.E., F.IEEE Laboratory Fellow/Technical Group Manager Pacific Northwest National Laboratory (PNNL) Big Data Access, Analytics and Sense-Making IEEE PES AMPS Big Data Subcommittee Webinar October 10, 2018 PNNL-SA-138532

Big Data Access, Analytics and Sense-Makingyweng2/Tutorial5/pdf/zhen1.pdf · Zhenyu (Henry) Huang, Ph.D., P.E., F.IEEE Laboratory Fellow/Technical Group Manager Pacific Northwest

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Big Data Access, Analytics and Sense-Makingyweng2/Tutorial5/pdf/zhen1.pdf · Zhenyu (Henry) Huang, Ph.D., P.E., F.IEEE Laboratory Fellow/Technical Group Manager Pacific Northwest

Zhenyu (Henry) Huang, Ph.D., P.E., F.IEEELaboratory Fellow/Technical Group ManagerPacific Northwest National Laboratory (PNNL)

Big Data Access, Analytics and Sense-Making

IEEE PES AMPS Big Data SubcommitteeWebinarOctober 10, 2018

PNNL-SA-138532

Page 2: Big Data Access, Analytics and Sense-Makingyweng2/Tutorial5/pdf/zhen1.pdf · Zhenyu (Henry) Huang, Ph.D., P.E., F.IEEE Laboratory Fellow/Technical Group Manager Pacific Northwest

Deployment of a vast new phasor network is generating unprecedented real-time data

2

April 2007 March 2015

Today – SCADA data  Emerging – phasor data Improvement

Variety voltage + current + phase angle, … more information

Velocity 1 sample / 4 seconds 30‐120 samples / second ~200x faster

Volume 8 terabytes / year 1.5 petabytes / year  ~200x more data

Veracity unseen ms‐oscillations  oscillations seen at 10ms greater accuracy

Page 3: Big Data Access, Analytics and Sense-Makingyweng2/Tutorial5/pdf/zhen1.pdf · Zhenyu (Henry) Huang, Ph.D., P.E., F.IEEE Laboratory Fellow/Technical Group Manager Pacific Northwest

Smart devices and 2-way communication offer new opportunities, greater complexity

3

60,000 Smart Meters

Page 4: Big Data Access, Analytics and Sense-Makingyweng2/Tutorial5/pdf/zhen1.pdf · Zhenyu (Henry) Huang, Ph.D., P.E., F.IEEE Laboratory Fellow/Technical Group Manager Pacific Northwest

More diverse data add to the complexity

Weather/climate data (e.g. PNNL ARM* Data)

300 instruments, 2000 data streams 24/7500 GB/day rising to multiple TBs/dayCurating 20 years’ data

Market/business dataCyber/communication dataSimulated data

Each contingency scenario generates 0.5M bytes data, adding up to TB scale

4

Contingency Analysis

Number of scenarios

Serial computing on 1

processor

Parallel computing on

512 processors

Parallel computing on 10,000

processors

WECC N-1 (full) 20,000 4 hours~30 seconds

469x speed upWECC N-2 (partial) 153,600 26 hours

~3 minutes492x speed up

~12 seconds7877x speed up

*ARM: Atmospheric Radiation Measurement

Page 5: Big Data Access, Analytics and Sense-Makingyweng2/Tutorial5/pdf/zhen1.pdf · Zhenyu (Henry) Huang, Ph.D., P.E., F.IEEE Laboratory Fellow/Technical Group Manager Pacific Northwest

Data volume comparison: grid vs. big science data

100 103KB

106MB

109GB

1012TB

1015PB

Page 6: Big Data Access, Analytics and Sense-Makingyweng2/Tutorial5/pdf/zhen1.pdf · Zhenyu (Henry) Huang, Ph.D., P.E., F.IEEE Laboratory Fellow/Technical Group Manager Pacific Northwest

Real-time data ingestion, retrieval, curationfrom a distributed sensor network

RequirementsCyber-secure sensor networkData provenance and privacyReal-time processing

Solution: scalable, flexible middleware and R/Hadoop statistical analysis capabilities

Data ingestion is 103 times faster than MySQLLinearly scales to many nodesData curation cleans data and detect events with confidence in real-time

MySQL – Retrieval 

MySQL – Ingestion 

Data Ingestion and Retrieval 

New Method – Ingestion 

New Method – Retrieval

Define problem

Define model

Run model over entire data set

Select interesting subsets of the data

Analyze results / patterns

Model validated

Refinemodel

Data Curation 

Page 7: Big Data Access, Analytics and Sense-Makingyweng2/Tutorial5/pdf/zhen1.pdf · Zhenyu (Henry) Huang, Ph.D., P.E., F.IEEE Laboratory Fellow/Technical Group Manager Pacific Northwest

Making data accessible is a big challenge

Organizing and converting data to application specific formats

SQL TablesNoSQL

CSV

Text files

PI

Time series

Operational or Planning Applications

PTI

1. Connect to n data sources 2. Get Data * n3. Combine * n4. Transform * n5. Analyze

JSON

XML

Redundancy: Underlined steps has to be performed by every application for each type of data

Page 8: Big Data Access, Analytics and Sense-Makingyweng2/Tutorial5/pdf/zhen1.pdf · Zhenyu (Henry) Huang, Ph.D., P.E., F.IEEE Laboratory Fellow/Technical Group Manager Pacific Northwest

Making data accessible is a big challenge

Operational or Planning ApplicationsGOSS

1. Connect to n data sources 2. Get Data3. Combine4. Transform

1. Connect to GOSS2. Request Data3. Analyze

Organizing and converting data to application specific formats

SQL TablesNoSQL

CSV

Text files

PI

Time series

PTI

JSON

XML

GOSS = GridOPTICS Software System, https://github.com/GridOPTICS/GOSS

Page 9: Big Data Access, Analytics and Sense-Makingyweng2/Tutorial5/pdf/zhen1.pdf · Zhenyu (Henry) Huang, Ph.D., P.E., F.IEEE Laboratory Fellow/Technical Group Manager Pacific Northwest

GOSSTM: link data to applications

9

https://github.com/GridOPTICS/GOSS

Page 10: Big Data Access, Analytics and Sense-Makingyweng2/Tutorial5/pdf/zhen1.pdf · Zhenyu (Henry) Huang, Ph.D., P.E., F.IEEE Laboratory Fellow/Technical Group Manager Pacific Northwest

Analytical challenges in multi-domain data-driven reasoning

10

Page 11: Big Data Access, Analytics and Sense-Makingyweng2/Tutorial5/pdf/zhen1.pdf · Zhenyu (Henry) Huang, Ph.D., P.E., F.IEEE Laboratory Fellow/Technical Group Manager Pacific Northwest

Computational challenges in keeping up with data cycles

Dynamic state estimation: computational performance achieved 30ms target for regional systems (1000s buses). WECC-size system (16,000 buses) – a 100 TF problem with 48ms performance on 3,000 cores. Remaining bottleneck is communication. Memory bandwidth advancements expect to meet the 30ms target.

11

0.0001

0.001

0.01

0.1

1

10

100

1000

10000

 0.003 0.030 0.300 3.000 30.000

TeraFLOPS

Seconds

Mar 2017

30 ms

Oct 2012June 2014

Oct 2012June 2014

Computational time per estimation step

Page 12: Big Data Access, Analytics and Sense-Makingyweng2/Tutorial5/pdf/zhen1.pdf · Zhenyu (Henry) Huang, Ph.D., P.E., F.IEEE Laboratory Fellow/Technical Group Manager Pacific Northwest

Mathematical challenges in handling non-Gaussian noise in power grid measurement

Current data applications assume Gaussian noise. New mathematics needs to be developed and adapted for handling non-Gaussian noise, such as Particle Filters, Gaussian Mixture Methods.

12

-0.22 -0.2 -0.18 -0.16 -0.14 -0.12 -0.10

20

40

60

80

100

120Histogram (pValue=0.00%)

Freq

uenc

yMeasurements

pValue=0.00%

Noise extracted from PMU Noise property analysis

Page 13: Big Data Access, Analytics and Sense-Makingyweng2/Tutorial5/pdf/zhen1.pdf · Zhenyu (Henry) Huang, Ph.D., P.E., F.IEEE Laboratory Fellow/Technical Group Manager Pacific Northwest

Advanced visualization for improving hydrostate awareness (Hydromap)

Modernize displays for hydro planning and operationsDevelop new, novel visualization techniques and paradigms for analyzing dynamic dataDevelop modular framework for deploying and integrating new data visualizations

13

Current data display in need of modernization

Interactive hydro map as a new visualization paradigm

Page 14: Big Data Access, Analytics and Sense-Makingyweng2/Tutorial5/pdf/zhen1.pdf · Zhenyu (Henry) Huang, Ph.D., P.E., F.IEEE Laboratory Fellow/Technical Group Manager Pacific Northwest

Multi-dimensional wind visualization (Glyphs)

14

Wind visualization showing multi-dimensional data in glyphs

Wind speed (length of tail)Wind direction (angle of tail)Generation (size of head)Uncertainty (color of tail)Forecast variability Wholesale price CapacityLast hour generationSCE error codeGeneration difference from forecast

Wind Visualization/Wind Forecast Visualization

Page 15: Big Data Access, Analytics and Sense-Makingyweng2/Tutorial5/pdf/zhen1.pdf · Zhenyu (Henry) Huang, Ph.D., P.E., F.IEEE Laboratory Fellow/Technical Group Manager Pacific Northwest

Historical hydro view using radial visualization

15

Compares hydropower generation across different projects along Columbia and Snake riversAlternative view shows generation across different sources such as hydropower, nuclear, renewables, and miscellaneous sources

Page 16: Big Data Access, Analytics and Sense-Makingyweng2/Tutorial5/pdf/zhen1.pdf · Zhenyu (Henry) Huang, Ph.D., P.E., F.IEEE Laboratory Fellow/Technical Group Manager Pacific Northwest

Data repository for public hosting

16

Data Tools

Web PortalData Repository

Download Processing

Generation Processing

User

Use existing datasets

Generate new datasets

Submit datasets

Submission Processing

Validated models & scenarios User‐generated 

models & scenarios

Existing models & scenarios

Dataset Generation/

Anonymization Methods

Dataset Metrics/

Validation

Download request

Case configuration

3rd-party datasets

Private Datasets

Data Repository

Anonymized DataCase Published New Case Request

Data Generation

Page 17: Big Data Access, Analytics and Sense-Makingyweng2/Tutorial5/pdf/zhen1.pdf · Zhenyu (Henry) Huang, Ph.D., P.E., F.IEEE Laboratory Fellow/Technical Group Manager Pacific Northwest

DR POWER: ARPA-E Data Repository

17

https://egriddata.org

Page 18: Big Data Access, Analytics and Sense-Makingyweng2/Tutorial5/pdf/zhen1.pdf · Zhenyu (Henry) Huang, Ph.D., P.E., F.IEEE Laboratory Fellow/Technical Group Manager Pacific Northwest

ARPA-E Grid Optimization Competition (open now)

18

https://gocompetition.energy.gov

Page 19: Big Data Access, Analytics and Sense-Makingyweng2/Tutorial5/pdf/zhen1.pdf · Zhenyu (Henry) Huang, Ph.D., P.E., F.IEEE Laboratory Fellow/Technical Group Manager Pacific Northwest

Summary

Grid data complexity is increasing with big volumes, diverse types, and various attributes. Such complexity poses significant challenges in data access, transformation, analytics, sense making. Math, computing and visualization technologies need to be developed to meet these challenges.

GOSS as an example big data platform. Multi-domain data reasoning and high performance computing. Modular visualization for information presentation.

19

Page 20: Big Data Access, Analytics and Sense-Makingyweng2/Tutorial5/pdf/zhen1.pdf · Zhenyu (Henry) Huang, Ph.D., P.E., F.IEEE Laboratory Fellow/Technical Group Manager Pacific Northwest

Acknowledgement

PNNL Researchers: (Data and Computing) Poorva Sharma, Steve Elbert, Shuangshuang Jin, Bruce Palmer, George Chin; (Power Engineering) Yousu Chen, Mark Rice, Shaobu Wang, Karen StudarusFormer PNNL Researchers: Ruisheng Diao, Bora Akyol, Terrence Critchlow, Ning Zhou, Ning Lu, Pengwei DuFunding support by:

PNNL Future Power Grid Initiative (FPGI) DOE Office of Electricity Delivery and Energy Reliability (OE) Advanced Grid Modeling ProgramDOE Grid Modernization Initiative DOE Advanced Scientific Computing Research (ASCR) Applied Math ProgramDOE Advanced Research Program Agency – Energy (ARPA-E) Bonneville Power Administration (BPA)

20

Page 21: Big Data Access, Analytics and Sense-Makingyweng2/Tutorial5/pdf/zhen1.pdf · Zhenyu (Henry) Huang, Ph.D., P.E., F.IEEE Laboratory Fellow/Technical Group Manager Pacific Northwest

Questions?

21

Zhenyu (Henry) Huang, Ph.D., P.E., F.IEEELaboratory Fellow/Technical Group ManagerPacific Northwest National [email protected]

Further Information: GridOPTICS: http://gridoptics.pnnl.gov/GridOPTICS™ Software System (GOSS): https://github.com/GridOPTICS/GOSSInteractive Visualization and Demo Center: http://vis.pnnl.gov/DR POWER: ARPA‐E Data Repository: https://egriddata.orgARPA‐E Grid Optimization Competition: https://gocompetition.energy.gov