29
1 © 2015 Rogue Wave Software, Inc. All Rights Reserved. 1 Java in the database–is it really useful? Solving impossible Big Data challenges Wendy Hou, Product Manager Mark Sweeney, Sales Engineer

Java in the database–is it really useful? Solving impossible Big Data challenges

Embed Size (px)

Citation preview

Page 1: Java in the database–is it really useful? Solving impossible Big Data challenges

1© 2015 Rogue Wave Software, Inc. All Rights Reserved. 1

Java in the database–is it really useful?Solving impossible Big Data challenges

Wendy Hou, Product ManagerMark Sweeney, Sales Engineer

Page 2: Java in the database–is it really useful? Solving impossible Big Data challenges

2© 2015 Rogue Wave Software, Inc. All Rights Reserved. 2

Why embed analytics?• Faster and more efficient

– Data extraction could take a large percentage of the analysis time– Business users can get results by changes a few variables and rerun

the models and not depend on others to implement changes and rerun

– Real time, on demand, without synchronization delay• Simpler and greater volume

– Simpler user experience– Able to analyze larger data set

• Lower cost– Opportunity cost– Cost of maintaining the analytic infrastructure (HW, SW, staff,

maintenance, platforms)

Page 3: Java in the database–is it really useful? Solving impossible Big Data challenges

3© 2015 Rogue Wave Software, Inc. All Rights Reserved. 3

Why embed analytics in DB

• Accuracy and accessibility– Data and formula in one place avoids

potential user errors– Invoke data and analytics from any

programming language or application that can connect to the database

• Higher security – data used as input to the analytics never leaves the database

Page 4: Java in the database–is it really useful? Solving impossible Big Data challenges

4© 2015 Rogue Wave Software, Inc. All Rights Reserved. 4

What can you use?

JMSL is the pure Java member of the IMSL family

Page 5: Java in the database–is it really useful? Solving impossible Big Data challenges

5© 2015 Rogue Wave Software, Inc. All Rights Reserved. 5

Diverse data management world

SQL

NoSQL

Hadoop

MapReduc

e

SparkJava

JavaScript

In-memory

On-disk

Page 6: Java in the database–is it really useful? Solving impossible Big Data challenges

6© 2015 Rogue Wave Software, Inc. All Rights Reserved. 6

Taxonomy of DB analyticsPlatformAnalytic

s

Executable

ExecutableExecutableExecutable

Analytics Executable

Analytics

ProprietaryPlatform

Analytics

Multitier

DistributedPlatform Database

Analytics invoked externally but run in-server or in-database. Includes in-memory DBs

Stored data and analytics are physically separatedArchitecture could vary.

Page 7: Java in the database–is it really useful? Solving impossible Big Data challenges

7© 2015 Rogue Wave Software, Inc. All Rights Reserved. 7

In-database JMSL

• Analytics run on DB’s internal JVM

• JMSL classes stored as DB objects

• Highly portable, identical code runs cross-platform

Executable

Analytics

Page 8: Java in the database–is it really useful? Solving impossible Big Data challenges

8© 2015 Rogue Wave Software, Inc. All Rights Reserved. 8

Proprietary Platform

Multitier Distributed Database Analytics

In-database

JMSL

Execution Technologies

SAS, MATLAB, others

Windows, Linux

Hadoop, Cassandra-Spark

SAP HANA, Oracle Advanced Analytics

Database

Non-proprietary languageEfficient Data Transfer

Distributed/ScalableSecure

Portable/ReusableAlgorithm Coverage

Performance Low Cost (with setup)

Analytics

Executable Executable

Analytics

PlatformAnalytic

s AnalyticsAnalytics

ExecutableExecutableExecutable

Analytics

Executable

Analytics

Analytics

Page 9: Java in the database–is it really useful? Solving impossible Big Data challenges

9© 2015 Rogue Wave Software, Inc. All Rights Reserved. 9

The solution

Page 10: Java in the database–is it really useful? Solving impossible Big Data challenges

10© 2015 Rogue Wave Software, Inc. All Rights Reserved. 10

Challenge: Meet all requirements

In-database JMSL is uniquely positioned to solve the technical and practical challenges for DB analytics.

Pure Java

Minimizes network trafficDistributed/Scalable

Highly Secure

Portable/Reusable

Algorithm Coverage

High Performance

Low Cost

Page 11: Java in the database–is it really useful? Solving impossible Big Data challenges

11© 2015 Rogue Wave Software, Inc. All Rights Reserved. 11

Benefits of in-database JMSL• Faster results • Higher accuracy• Better quality of data• Higher security• Greater accessibility

Additionally:• Trusted technology – JMSL is a known and proven

product• Minimal risk – works with many platforms without

modification

Executable

Analytics

Page 12: Java in the database–is it really useful? Solving impossible Big Data challenges

12© 2015 Rogue Wave Software, Inc. All Rights Reserved. 12

Data quality and accuracy

• JMSL has numerous data cleaning routines for numerical data

– Eliminate data staging before loading

• Reducing network traffic reduces risk of data corruption

• Data and formula in one place - avoids potential user errors

Page 13: Java in the database–is it really useful? Solving impossible Big Data challenges

13© 2015 Rogue Wave Software, Inc. All Rights Reserved. 13

Security

• Java implementation• Analytics run in DB process

space – not an external procedure

• Core data never on network for analytics

• DB privileges can be fine tuned: access to run analytics but not to underlying data

Page 14: Java in the database–is it really useful? Solving impossible Big Data challenges

14© 2015 Rogue Wave Software, Inc. All Rights Reserved. 14

Ease of use, accessibility• JMSL installation to the

DB is extremely easy• Developers only need to

write SQL/Java interfaces to JMSL routines

• Analytics invoked from any language that can connect to the DB

Page 15: Java in the database–is it really useful? Solving impossible Big Data challenges

15© 2015 Rogue Wave Software, Inc. All Rights Reserved. 15

Trusted technology• In-database JMSL leverages known stable technologies

– Java– SQL

• Does not require learning the latest, greatest programming language

• Does not require learning a new ecosystem

however …• Only requirement is a JVM

– Integrates with the new ecosystems– Callable by Scala, Groovy, Clojure, etc.– Supported in many JavaScript engines

Page 16: Java in the database–is it really useful? Solving impossible Big Data challenges

16© 2015 Rogue Wave Software, Inc. All Rights Reserved. 16

The details

Page 17: Java in the database–is it really useful? Solving impossible Big Data challenges

17© 2015 Rogue Wave Software, Inc. All Rights Reserved. 17

JMSL under the hood• Pure Java• 100s of classes• Part of IMSL family• Extensive

documentation• Well supported

JMSL architecture

Page 18: Java in the database–is it really useful? Solving impossible Big Data challenges

18© 2015 Rogue Wave Software, Inc. All Rights Reserved. 18

Architecture

SQL subprogram

Java class

JMSL

data

Database storageDB process

execution

SQL Interpreter

Java Virtual Machine

SQL Engine

JMSL routines run here

Server

external processes

Some a

nalyt

ics

pack

ages

run h

ere

Database

Page 19: Java in the database–is it really useful? Solving impossible Big Data challenges

19© 2015 Rogue Wave Software, Inc. All Rights Reserved. 19

JMSL and SQL: not a paradigm shift

Targeting respective strengths

• Java introduced as RDBs grew into their modern form.

• JDBC was introduced in JDK 1.1 (1997)• Direct mappings of fundamental SQL data types

in Java • Internal DB JVM allows seamless integration

between Java and SQL• Leverages stable, familiar technologies

In the database use SQL and JMSL for their respective strengths.

• SQL: queries, DDL, DML• JMSL: advanced analytics

Page 20: Java in the database–is it really useful? Solving impossible Big Data challenges

20© 2015 Rogue Wave Software, Inc. All Rights Reserved. 20

It’s so easy even I could do it

Page 21: Java in the database–is it really useful? Solving impossible Big Data challenges

21© 2015 Rogue Wave Software, Inc. All Rights Reserved. 21

First step, install JMSL to the DB

… that’s it

Page 22: Java in the database–is it really useful? Solving impossible Big Data challenges

22© 2015 Rogue Wave Software, Inc. All Rights Reserved. 22

JMSL classes as DB objectsIn

stall

ed JM

SL

class

es

All dependencies resolved

Nearly 200 JMSL classes

Page 23: Java in the database–is it really useful? Solving impossible Big Data challenges

23© 2015 Rogue Wave Software, Inc. All Rights Reserved. 23

UDF steps 1. Write UDF as Java static method

a) Compile to byte codeb) Load class file to DB

2. Write SQL call specification for UDF a) not a wrapper (no extra execution layer)b) Maps Java and SQL typesc) Saved as SQL stored procedure

3. Use stored procedure for in-DB analytics

Page 24: Java in the database–is it really useful? Solving impossible Big Data challenges

24© 2015 Rogue Wave Software, Inc. All Rights Reserved. 24

Java UDFs stored with SQL alias

Java UDF as DB object

1 3AutoARIMA output

2

SQL call spec.

Page 25: Java in the database–is it really useful? Solving impossible Big Data challenges

25© 2015 Rogue Wave Software, Inc. All Rights Reserved. 25

Code snippet public static java.sql.Array AA1 ( ResultSet rs, int nrows, int nforecast ) throws SQLException {

java.sql.Array array = null;// … skipped lines of data prep

// 2D array to hold AutoARIMA outputdouble[][] darr = new double[7][n+1];// instantiate JMSL objectAutoARIMA autoArima = new AutoARIMA(t, x); // … skipped lines of data processing

// create a varray of varrays with the double[][] dataarray = RWArrayOut.varrVarrOut(darr);return array;

} // from RWAutoArima.java

Page 26: Java in the database–is it really useful? Solving impossible Big Data challenges

26© 2015 Rogue Wave Software, Inc. All Rights Reserved. 26

Summary

Page 27: Java in the database–is it really useful? Solving impossible Big Data challenges

27© 2015 Rogue Wave Software, Inc. All Rights Reserved. 27

Java in the DB is more than useful … when combined with JMSL• Non-proprietary language• Efficient data transfer• Distributed/scalable• Secure• Portable/reusable• Extensive collections of algorithms • Performance • Low cost and easy to implement

Page 28: Java in the database–is it really useful? Solving impossible Big Data challenges

28© 2015 Rogue Wave Software, Inc. All Rights Reserved. 28

Additional resources• White papers available at roguewave.com

– Tech tutorial: Embedding analytics into a database using JMSL

– Using JMSL in Hadoop MapReduce applications– Time series analysis Auto Arima– and many others

• JMSL Manual and API available at roguewave.com

• Rogue Wave Professional Services– Development of high performance applications– Migration services– Assistance with Rogue Wave products

Page 29: Java in the database–is it really useful? Solving impossible Big Data challenges

29© 2015 Rogue Wave Software, Inc. All Rights Reserved. 29