Upload
rogue-wave-software
View
327
Download
0
Embed Size (px)
Citation preview
1© 2015 Rogue Wave Software, Inc. All Rights Reserved. 1
Java in the database–is it really useful?Solving impossible Big Data challenges
Wendy Hou, Product ManagerMark Sweeney, Sales Engineer
2© 2015 Rogue Wave Software, Inc. All Rights Reserved. 2
Why embed analytics?• Faster and more efficient
– Data extraction could take a large percentage of the analysis time– Business users can get results by changes a few variables and rerun
the models and not depend on others to implement changes and rerun
– Real time, on demand, without synchronization delay• Simpler and greater volume
– Simpler user experience– Able to analyze larger data set
• Lower cost– Opportunity cost– Cost of maintaining the analytic infrastructure (HW, SW, staff,
maintenance, platforms)
3© 2015 Rogue Wave Software, Inc. All Rights Reserved. 3
Why embed analytics in DB
• Accuracy and accessibility– Data and formula in one place avoids
potential user errors– Invoke data and analytics from any
programming language or application that can connect to the database
• Higher security – data used as input to the analytics never leaves the database
4© 2015 Rogue Wave Software, Inc. All Rights Reserved. 4
What can you use?
JMSL is the pure Java member of the IMSL family
5© 2015 Rogue Wave Software, Inc. All Rights Reserved. 5
Diverse data management world
SQL
NoSQL
Hadoop
MapReduc
e
SparkJava
JavaScript
In-memory
On-disk
6© 2015 Rogue Wave Software, Inc. All Rights Reserved. 6
Taxonomy of DB analyticsPlatformAnalytic
s
Executable
ExecutableExecutableExecutable
Analytics Executable
Analytics
ProprietaryPlatform
Analytics
Multitier
DistributedPlatform Database
Analytics invoked externally but run in-server or in-database. Includes in-memory DBs
Stored data and analytics are physically separatedArchitecture could vary.
7© 2015 Rogue Wave Software, Inc. All Rights Reserved. 7
In-database JMSL
• Analytics run on DB’s internal JVM
• JMSL classes stored as DB objects
• Highly portable, identical code runs cross-platform
Executable
Analytics
8© 2015 Rogue Wave Software, Inc. All Rights Reserved. 8
Proprietary Platform
Multitier Distributed Database Analytics
In-database
JMSL
Execution Technologies
SAS, MATLAB, others
Windows, Linux
Hadoop, Cassandra-Spark
SAP HANA, Oracle Advanced Analytics
Database
Non-proprietary languageEfficient Data Transfer
Distributed/ScalableSecure
Portable/ReusableAlgorithm Coverage
Performance Low Cost (with setup)
Analytics
Executable Executable
Analytics
PlatformAnalytic
s AnalyticsAnalytics
ExecutableExecutableExecutable
Analytics
Executable
Analytics
Analytics
9© 2015 Rogue Wave Software, Inc. All Rights Reserved. 9
The solution
10© 2015 Rogue Wave Software, Inc. All Rights Reserved. 10
Challenge: Meet all requirements
In-database JMSL is uniquely positioned to solve the technical and practical challenges for DB analytics.
Pure Java
Minimizes network trafficDistributed/Scalable
Highly Secure
Portable/Reusable
Algorithm Coverage
High Performance
Low Cost
11© 2015 Rogue Wave Software, Inc. All Rights Reserved. 11
Benefits of in-database JMSL• Faster results • Higher accuracy• Better quality of data• Higher security• Greater accessibility
Additionally:• Trusted technology – JMSL is a known and proven
product• Minimal risk – works with many platforms without
modification
Executable
Analytics
12© 2015 Rogue Wave Software, Inc. All Rights Reserved. 12
Data quality and accuracy
• JMSL has numerous data cleaning routines for numerical data
– Eliminate data staging before loading
• Reducing network traffic reduces risk of data corruption
• Data and formula in one place - avoids potential user errors
13© 2015 Rogue Wave Software, Inc. All Rights Reserved. 13
Security
• Java implementation• Analytics run in DB process
space – not an external procedure
• Core data never on network for analytics
• DB privileges can be fine tuned: access to run analytics but not to underlying data
14© 2015 Rogue Wave Software, Inc. All Rights Reserved. 14
Ease of use, accessibility• JMSL installation to the
DB is extremely easy• Developers only need to
write SQL/Java interfaces to JMSL routines
• Analytics invoked from any language that can connect to the DB
15© 2015 Rogue Wave Software, Inc. All Rights Reserved. 15
Trusted technology• In-database JMSL leverages known stable technologies
– Java– SQL
• Does not require learning the latest, greatest programming language
• Does not require learning a new ecosystem
however …• Only requirement is a JVM
– Integrates with the new ecosystems– Callable by Scala, Groovy, Clojure, etc.– Supported in many JavaScript engines
16© 2015 Rogue Wave Software, Inc. All Rights Reserved. 16
The details
17© 2015 Rogue Wave Software, Inc. All Rights Reserved. 17
JMSL under the hood• Pure Java• 100s of classes• Part of IMSL family• Extensive
documentation• Well supported
JMSL architecture
18© 2015 Rogue Wave Software, Inc. All Rights Reserved. 18
Architecture
SQL subprogram
Java class
JMSL
data
Database storageDB process
execution
SQL Interpreter
Java Virtual Machine
SQL Engine
JMSL routines run here
Server
external processes
Some a
nalyt
ics
pack
ages
run h
ere
Database
19© 2015 Rogue Wave Software, Inc. All Rights Reserved. 19
JMSL and SQL: not a paradigm shift
Targeting respective strengths
• Java introduced as RDBs grew into their modern form.
• JDBC was introduced in JDK 1.1 (1997)• Direct mappings of fundamental SQL data types
in Java • Internal DB JVM allows seamless integration
between Java and SQL• Leverages stable, familiar technologies
In the database use SQL and JMSL for their respective strengths.
• SQL: queries, DDL, DML• JMSL: advanced analytics
20© 2015 Rogue Wave Software, Inc. All Rights Reserved. 20
It’s so easy even I could do it
21© 2015 Rogue Wave Software, Inc. All Rights Reserved. 21
First step, install JMSL to the DB
… that’s it
22© 2015 Rogue Wave Software, Inc. All Rights Reserved. 22
JMSL classes as DB objectsIn
stall
ed JM
SL
class
es
All dependencies resolved
Nearly 200 JMSL classes
23© 2015 Rogue Wave Software, Inc. All Rights Reserved. 23
UDF steps 1. Write UDF as Java static method
a) Compile to byte codeb) Load class file to DB
2. Write SQL call specification for UDF a) not a wrapper (no extra execution layer)b) Maps Java and SQL typesc) Saved as SQL stored procedure
3. Use stored procedure for in-DB analytics
24© 2015 Rogue Wave Software, Inc. All Rights Reserved. 24
Java UDFs stored with SQL alias
Java UDF as DB object
1 3AutoARIMA output
2
SQL call spec.
25© 2015 Rogue Wave Software, Inc. All Rights Reserved. 25
Code snippet public static java.sql.Array AA1 ( ResultSet rs, int nrows, int nforecast ) throws SQLException {
java.sql.Array array = null;// … skipped lines of data prep
// 2D array to hold AutoARIMA outputdouble[][] darr = new double[7][n+1];// instantiate JMSL objectAutoARIMA autoArima = new AutoARIMA(t, x); // … skipped lines of data processing
// create a varray of varrays with the double[][] dataarray = RWArrayOut.varrVarrOut(darr);return array;
} // from RWAutoArima.java
26© 2015 Rogue Wave Software, Inc. All Rights Reserved. 26
Summary
27© 2015 Rogue Wave Software, Inc. All Rights Reserved. 27
Java in the DB is more than useful … when combined with JMSL• Non-proprietary language• Efficient data transfer• Distributed/scalable• Secure• Portable/reusable• Extensive collections of algorithms • Performance • Low cost and easy to implement
28© 2015 Rogue Wave Software, Inc. All Rights Reserved. 28
Additional resources• White papers available at roguewave.com
– Tech tutorial: Embedding analytics into a database using JMSL
– Using JMSL in Hadoop MapReduce applications– Time series analysis Auto Arima– and many others
• JMSL Manual and API available at roguewave.com
• Rogue Wave Professional Services– Development of high performance applications– Migration services– Assistance with Rogue Wave products
29© 2015 Rogue Wave Software, Inc. All Rights Reserved. 29