33
www.modeliosoft.com Towards Modeling Approach Enabling Efficient Platform for Heterogeneous Big Data Analysis [email protected] 1

JUNIPER: Towards Modeling Approach Enabling Efficient Platform for Heterogeneous Big Data Analysis

Embed Size (px)

Citation preview

Page 1: JUNIPER: Towards Modeling Approach Enabling Efficient Platform for Heterogeneous Big Data Analysis

www.modeliosoft.com

Towards Modeling Approach Enabling Efficient Platform for Heterogeneous Big Data Analysis

[email protected]

1

Page 2: JUNIPER: Towards Modeling Approach Enabling Efficient Platform for Heterogeneous Big Data Analysis

Outlines

IntroductionModel-driven development

Big Data

JuniperCase-study

resultsConclusions

www.modeliosoft.com 2

Page 3: JUNIPER: Towards Modeling Approach Enabling Efficient Platform for Heterogeneous Big Data Analysis

20 ME

2006

17,5 ME

2005

70 ME

2013

ParisRennesNantes

Sophia

SOFTEAM – a French IT services / Software vendor

• SOFTEAM, a growing company

25 years’ experience 850 experts Regular growth

• Specialist in OO technologies, new architectures, methodologies

• Banking, Defense, Telecom, …

www.modeliosoft.com 3

23 ME

2008

Page 4: JUNIPER: Towards Modeling Approach Enabling Efficient Platform for Heterogeneous Big Data Analysis

Modelio for Software and System Engineering

• UML editor with 20 years’ historyo CloudML

o SysML

o MARTE

o Code generation

o Documentation

o Teamwork

www.modeliosoft.com 4

• Available under open source at Modelio.org

Page 5: JUNIPER: Towards Modeling Approach Enabling Efficient Platform for Heterogeneous Big Data Analysis

MODEL-DRIVEN DEVELOPMENT

www.modeliosoft.com 5

Page 6: JUNIPER: Towards Modeling Approach Enabling Efficient Platform for Heterogeneous Big Data Analysis

It is all about models … Starting with UML

www.modeliosoft.com 6

Requirements

UML Use Cases

Architecture

UML Components

and Classes

Design

Refined Classes

or Domain Specific Language

Implementation

Code generation

Java, C++, Frameworks

Page 7: JUNIPER: Towards Modeling Approach Enabling Efficient Platform for Heterogeneous Big Data Analysis

Model = Code

www.modeliosoft.com 7

Page 8: JUNIPER: Towards Modeling Approach Enabling Efficient Platform for Heterogeneous Big Data Analysis

Typical example: Control system for a frigate

• 800+ components

• Developed by 100+ engineers

• 1M+ LOC

• MDD fosters Productivity and Quality with

o Code generation

o Components reuse

o Tracing

o Automation

www.modeliosoft.com 8

Page 9: JUNIPER: Towards Modeling Approach Enabling Efficient Platform for Heterogeneous Big Data Analysis

Curious DSL example: Ruby on Rails

Haml HTML

%br{:clear => left’} <br clear=”left”/>

%p.foo Hello <p class=”foo”>Hello</p>

%p#foo Hello <p id=”foo”>Hello</p>

.foo <div class=”foo”>...</div>

#foo.bar <div id=”foo” class=”bar”>...</div>

www.modeliosoft.com 9

Feature: User can manually add movie

Scenario: Add a movieGiven I am on the RottenPotatoes home pageWhen I follow "Add new movie"Then I should be on the Create New Movie pageWhen I fill in "Title" with "Men In Black"And I should see "Men In Black"

Cucumber and Capybara

HAML

Page 10: JUNIPER: Towards Modeling Approach Enabling Efficient Platform for Heterogeneous Big Data Analysis

What do we get from MDD?

Pros

• Design once, deploy everywhere!

• Write your transformation once, transform anything!

Cons

• Transformations are hard to write…

• How to make sure they are CORRECT? i.e.

– Is there any data/semantic loss?

www.modeliosoft.com 10

Page 11: JUNIPER: Towards Modeling Approach Enabling Efficient Platform for Heterogeneous Big Data Analysis

BIG DATA

www.modeliosoft.com 11

Page 12: JUNIPER: Towards Modeling Approach Enabling Efficient Platform for Heterogeneous Big Data Analysis

Volume, variety, velocity

1. @-mails sent every second : 2,9 million

2. Video uploaded to YouTube every minute: 25 hours

3. Data processed by Google every day: 24 petabytes

4. Tweets per day: 50 million

5. Products ordered on Amazon per second: 73 items

www.modeliosoft.com 12

Page 13: JUNIPER: Towards Modeling Approach Enabling Efficient Platform for Heterogeneous Big Data Analysis

Only 0,5 % of data is analyzed

• In 2012, 2 837EB generated - just 0,5% actually analyzed.

That still amounts to 14EB(or 14.185 million terabytes)

Source: IDC & EMC

www.modeliosoft.com 13

Page 14: JUNIPER: Towards Modeling Approach Enabling Efficient Platform for Heterogeneous Big Data Analysis

SQL or Hadoop

www.modeliosoft.com 14

Do you have some data and a problem to solve?

yes

Data fits in memory?

*Inspired by: Aaron Cordova

Data fits on single RAID array?

Tons of options. Don’t need database or

Hadoop

yes

no

no

yesSolvable with SQL?

Use MySQL

yes

Can you program? Write a prog.

yes

no

no

Dead end

Page 15: JUNIPER: Towards Modeling Approach Enabling Efficient Platform for Heterogeneous Big Data Analysis

SQL or Hadoop (continued)

www.modeliosoft.com

15

* Inspired by: Aaron Cordova

Data fits on single RAID array?

no

Have lots of money?

Solvable with Oracle SQL?

no

Buy a SAN, Use Oracle

yes

Do you have a PhD in parallel prog. ?

no Roll your own MPI solution

yes

Solvable using MapReduce?

no

yes

Can you program MapReduce jobs?

Write MapReduce on

Hadoop?

yes

no

Dead end

no

Page 16: JUNIPER: Towards Modeling Approach Enabling Efficient Platform for Heterogeneous Big Data Analysis

Challenges

Hadoop MapReduce is the major trend

Success relies on personnel programming skills

Many problems are not solvable with Hadoop. Real-time?

MPI for high performance computing is an option when you have a lots of money and a PhD

www.modeliosoft.com 16

Page 17: JUNIPER: Towards Modeling Approach Enabling Efficient Platform for Heterogeneous Big Data Analysis

www.modeliosoft.com 17

Page 18: JUNIPER: Towards Modeling Approach Enabling Efficient Platform for Heterogeneous Big Data Analysis

JUNIPER integrates Big Data technologies over MPI

www.modeliosoft.com 18

DOCsStreamsDBs

Data Processing

Stage 1 Stage N

BusinessIntelligence

Analytical DBs

Visualization

dbdb

DOCsDOCs

Data Processing in JUNIPER

S1

S3

S2

Analytical DBs

mpi

mpi

mpi

mpi

FPGA-enabled nodes

Hadoop

HPC

Page 19: JUNIPER: Towards Modeling Approach Enabling Efficient Platform for Heterogeneous Big Data Analysis

Modelling in Juniper

www.modeliosoft.com 19

Models

High level

Architecture

(Nodes,Programs,

Streams…)

Real-time

constraints

Java

Code Code

Generation (+MPI initialization, communication, etc)

Reverse

Engineering

Schedulability

Analysis

Tool

(in progress)

Scheduling

Advisor

(in progress)

Measurements &

Advice

Deployment

Scripts

(in progress)

ConfigurationModel

Export

Code

Generation

Page 20: JUNIPER: Towards Modeling Approach Enabling Efficient Platform for Heterogeneous Big Data Analysis

Mapping Programming Model, UML and MARTE

www.modeliosoft.com 20

JUNIPERProgram

Channel

Cloud Node

ProgrammingModel

UML MARTE

Page 21: JUNIPER: Towards Modeling Approach Enabling Efficient Platform for Heterogeneous Big Data Analysis

Modelling the application and real-time constraints

www.modeliosoft.com 21

Real-time constrains- response time- bandwidth

Big Data flowJUNIPER Programs

Page 22: JUNIPER: Towards Modeling Approach Enabling Efficient Platform for Heterogeneous Big Data Analysis

Modelling the hardware infrastructure at a high level

www.modeliosoft.com 22

Cloud Node

CPU with 4 cores Hard drive

Page 23: JUNIPER: Towards Modeling Approach Enabling Efficient Platform for Heterogeneous Big Data Analysis

MPI code generation

www.modeliosoft.com 23

Code Generation

JUNIPER

Application Model

Page 24: JUNIPER: Towards Modeling Approach Enabling Efficient Platform for Heterogeneous Big Data Analysis

PETAFUEL CASE STUDY

www.modeliosoft.com 24

Page 25: JUNIPER: Towards Modeling Approach Enabling Efficient Platform for Heterogeneous Big Data Analysis

Risk: $45 million in half day

www.modeliosoft.com 25

Page 26: JUNIPER: Towards Modeling Approach Enabling Efficient Platform for Heterogeneous Big Data Analysis

Master Card debit card approval within 4 sec

www.modeliosoft.com 26

petaFuel - transaction approval within 4 seconds

Transactions

Events Stream

Historical Data

30 GB

Basic

Checks

Fraud Patterns

Detection

Transaction

ApprovalDecision

Page 27: JUNIPER: Towards Modeling Approach Enabling Efficient Platform for Heterogeneous Big Data Analysis

Juniper application model

www.modeliosoft.com 27

Page 28: JUNIPER: Towards Modeling Approach Enabling Efficient Platform for Heterogeneous Big Data Analysis

Deployment model

www.modeliosoft.com 28

Page 29: JUNIPER: Towards Modeling Approach Enabling Efficient Platform for Heterogeneous Big Data Analysis

MPI code generation

www.modeliosoft.com 29

public class EventProcessor {

public static final int RANK = 1;

public static IEventProcessor iEventProcessorImpl = new IEventProcessor() {

@Override

public void process(Event event) {

String key = getKeyFromTimestamp(event.getTimestamp());

String value = keyValueStoreIKeyValueStore.find(key);

if (value == null) {

keyValueStoreIKeyValueStore.put(key, "1");

} else {

int count = Integer.parseInt(value);

keyValueStoreIKeyValueStore.put(key, ""+(count+1);

}

}

};

public static void main(final String[] args) {

MPI.Init(args);

MPI.Finalize();

}

}

Page 30: JUNIPER: Towards Modeling Approach Enabling Efficient Platform for Heterogeneous Big Data Analysis

CONCLUSIONS

www.modeliosoft.com 30

Page 31: JUNIPER: Towards Modeling Approach Enabling Efficient Platform for Heterogeneous Big Data Analysis

Juniper trade-offs

www.modeliosoft.com 31

JUNIPER

Criteria Hadoop MPI

Communication HDFS - file system (httpd) HPC cluster interconnect (Infiniband)

Data flow Map Reduce Modeling + MPI comms

Parallelization Automatic Manually based on domain decomposition

Response time guaranties None Real-time for single node

Stages in multi-format No Any (incl. Hadoop + FPGA)

Hardware Commodity cluster HP cluster

Price € €€€

Skills + ++++

Customers General audience Critical systems

Page 32: JUNIPER: Towards Modeling Approach Enabling Efficient Platform for Heterogeneous Big Data Analysis

Prospects – more work

Work in progress

UML based language

• MPI Communication

• Timing properties

• Deployment

petaFuel case study

Future work

Modelling payload

Integrating schedulability

Running final evaluations

Final release

www.modeliosoft.com 32

Page 33: JUNIPER: Towards Modeling Approach Enabling Efficient Platform for Heterogeneous Big Data Analysis

Questions?

Andrey Sadovykh

Marcos Almeida

SOFTEAM | ModelioSoft

{name.surname}@softeam.fr

SOFTEAM R&D Web Site:

http://rd.softeam.com

Modelio Web Site :

http://www.modelio.org

JUNIPER Web Site :

http://www.juniper-project.org

www.modeliosoft.com 33

*

*for your questions