19
Data Science in Combustion an Introduction 1 st International [Brainstorming] Workshop on Combustion Data Science 5/20-21/2015

CombustionDataScienceIntro EDames

Embed Size (px)

Citation preview

Page 1: CombustionDataScienceIntro EDames

Data Science in Combustion – an Introduction

1st International [Brainstorming] Workshop on Combustion Data Science

5/20-21/2015

Page 2: CombustionDataScienceIntro EDames

Objectives • Communicate to audience the relationship of combustion science to data science.

Why? Combustion scientists can learn extensible and transferrable skills valid in the growing industry that is ‘data science’. Thus, not only can we motivate others, but we will encourage them to bring into the field tools from the data science community that they would otherwise be unaware of.

• An introductory lecture at any workshop/short course should seek to give an overview of:

• Clarification and definition of data science

• Data science as an industry and skill

• The data intensive aspects of combustion [physics and transport] research

• The history of data science in combustion

• Current efforts/projects in combustion data science

• The audience should leave the short-course with greater awareness of what the community has to offer, its current status & efforts, what they can do to advance it.

• We hope to reach the ‘tipping point’ into active and prolonged participation from the community (buy-in), especially for junior researchers

Page 3: CombustionDataScienceIntro EDames

‘Data Scientists’ are in high demand more hard-scientists than computer scientists:

physicists, chemists, engineers who hack their way to solutions with experience in:

programming, analytics

Page 4: CombustionDataScienceIntro EDames

The combustion community can and should capitalize on the current data science craze

There are many industry formats and tools that can guide analytics and data archival for combustion research, including:

• Databases and its best practices

• Community organization and communication

The amount of data in combustion and all sciences is increasing and we need a way to intelligently archive and manage it in an effort to avoid error

propagation as well as progress more efficiently as a community

Page 5: CombustionDataScienceIntro EDames

www.datacamp.com (2014)

Page 6: CombustionDataScienceIntro EDames

Aspects of ‘Data Science’ • Definition and delineation of objects and attributes

• Data Storage, Formats/Schemas

• Mining

• Evaluation and statistics

• Machine learning and predictive analytics

• Analytics, Software, and processing

• Collaboration in a global sense

• Community

Page 7: CombustionDataScienceIntro EDames

Examples of Data-intensive combustion related projects

•PrIMe = Process Informatics Modeling [Environment] •NIST kinetics database •Combustion codes and tools:

•RMG •ExGas •AcTcT •Cloudflame •OpenSmoke •Cantera •OpenFOAM

•Fields I’ve left out but that should also be addressed : Turbulent Combustion, Catalysis,

Laser Diagnostics

http://rmg.mit.edu/

http://primekinetics.org

Page 8: CombustionDataScienceIntro EDames

The status of combustion data science via the combustion cyberinfrastructure initiative

The mission: combustion data cyberinfrastructure for probing, archiving, and using experimental and computed data so the combustion community can accomplish its scientific and technological objectives more effectively, as well as helping shape multidisciplinary data collaboration as a new paradigm for research and development. Support: NSF, …

The demand for a cyberinfrastructure is clear

Kinetics: too many models with major inconsistencies, progress and conflict resolution is therefore slow Turbulent flow: enabling comparisons between models

Page 9: CombustionDataScienceIntro EDames

Benefits of a global cyberinfrastructure

• Facilitation of advances in the community through guiding/identifying experiments and modeling needs

• Data transparency • Quicker feedback through active online community (opposed to peer-

reviewed publications) Benefits of a cyberinfrastructure/data-focused

workshop or summer program

• Similar to the CEFRC school, facilitate collaboration and dissemination of info and ideas

• Students need resources above and beyond their professors • Provide guidance, direction, standards to the community • Students and junior scientists can ‘play’ around with data

Page 10: CombustionDataScienceIntro EDames

Combustion cyberinfrastructure framework

Page 11: CombustionDataScienceIntro EDames

Portal-warehouse framework

Distributed Warehouse

Cloudflame/Cantera

RMG

PrIMe

Open

SMOKE

Portals and workflows:

•Subjective

•Software and tools for making and testing models

•Data access/visualization

Warehouse:

•Archive

•Format

•Transparent discussion and peer review

Page 12: CombustionDataScienceIntro EDames

Cloud Computing and storage

Page 13: CombustionDataScienceIntro EDames

What any short course should include

Examples and illustrations of :

• Large datasets

• Data formats

• Critical data evaluation and analysis (where combustion knowledge is needed)

• UQ

• The history of data in combustion science

• Current efforts – where are the portals and warehouses and what can they do?

Page 14: CombustionDataScienceIntro EDames

Large data sets – examples

• Experimental data (turbulent flows, laser diagnostic data, shock tube)

• Species absorption cross sections (HITRAN)

• Results of ab initio calculations

• Thermochemical data (AcTcT)

• Chemical mechanisms/models (LLNL models)

• Mechanism/model validation experimental target sets

• Species and elementary reaction attributes

Page 15: CombustionDataScienceIntro EDames

Data formats – an example Species representation can be confusing and people have preferences. Toluene/CH3C6H5/C7H8/etc

1 C u0 p0 c0 {2,D} {6,S} {7,S} 2 C u0 p0 c0 {1,D} {3,S} {8,S} 3 C u0 p0 c0 {2,S} {4,D} {9,S} 4 C u0 p0 c0 {3,D} {5,S} {10,S} 5 C u0 p0 c0 {4,S} {6,D} {11,S} 6 C u0 p0 c0 {1,S} {5,D} {12,S} 7 C u0 p0 c0 {1,S} {13,S} {14,S} {15,S} 8 H u0 p0 c0 {2,S} 9 H u0 p0 c0 {3,S} 10 H u0 p0 c0 {4,S} 11 H u0 p0 c0 {5,S} 12 H u0 p0 c0 {6,S} 13 H u0 p0 c0 {7,S} 14 H u0 p0 c0 {7,S} 15 H u0 p0 c0 {7,S}

RMG species representation through an adjacency list:

InChI: InChI=1S/C7H8/c1-7-5-3-2-4-6-7/h2-6H,1H3 SMILES: Cc1ccccc1

methylbenzene 3101-08-4 50643-04-4 108-88-3 TOLUENE antisal 1a methacide methylbenzol monomethyl benzene phenyl methane tolu-sol toluol 314358_SIGMA Toluene-(ring-UL-14C) Toluolo [Italian] Benzene, methyl 89677_FLUKA UN 1294 UN1294 89680_FLUKA 89681_FLUKA WLN: 1R 34866_SIAL 155004_SIAL MBN C01455 179965_ALDRICH 32249_RIEDEL NSC406333 Toluene [UN1294] [Flammable liquid] Tolueen Otoline 322245_SIGMA Toluen NCGC00090939-02 Toluolo 676756_SIAL c0114 methyl-Benzene ST5214497 48572_SUPELCO 179418_SIAL 34413_RIEDEL

Page 16: CombustionDataScienceIntro EDames

Data interpretation and evaluation

•Critical analysis of available data is important •Baulch et al. rate evaluations and compilations

H2+OH = H2O+H

109

1010

1011

1012

1013

1014

1015

0 1 2 3 4

Frank & Just 1985

Ravishankara et al. 1981

Tully & Ravishankara 1980

Davidson et al. 1988

Oldenborg et al. 1992

Krasnoperov & Michael 2004

Michael & Sutherland 1988

Nguyen et al 2011

Orkin et al. 2006

Talukdar et al 1996

k3 (

cm

3/m

ol-

s)

1000K/T

Upperbound (x2)Lowerbound (/2)

Michael & Sutherland 1988

1012

1013

0.4 0.6 0.8 1.0

k3 (

cm

3/m

ol-

s)

1000K/T

Page 17: CombustionDataScienceIntro EDames

Data-intensive chemical mechanisms/models with parametric uncertainty

Page 18: CombustionDataScienceIntro EDames

Uncertainty quantification and minimization

Optimized (posterior) Unoptimized (prior)

Page 19: CombustionDataScienceIntro EDames

Possible Action Items • Further refine the program and curriculum based on results of this workshop • Reach out to the community to create more awareness • Solicit a host – when and where? • A discussion on consolidating data warehouses

• Inclusion • Bias • Liability • Permanent distributed hosting of a data warehouse • Security

Looking forward, we must be aware of and communicate the issues we will encounter