40
Strategies for All Your Data Session id: 40236 Sandeepan Banerjee Vishu Krishnamurthy Oracle Corporation

Strategies for All Your Data Session id: 40236 Sandeepan Banerjee Vishu Krishnamurthy Oracle Corporation

  • View
    222

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Strategies for All Your Data Session id: 40236 Sandeepan Banerjee Vishu Krishnamurthy Oracle Corporation

Strategies for All Your Data

Session id: 40236

Sandeepan BanerjeeVishu KrishnamurthyOracle Corporation

Page 2: Strategies for All Your Data Session id: 40236 Sandeepan Banerjee Vishu Krishnamurthy Oracle Corporation

Where are you spending your money ? Data Management Labor Software Integration Hardware and System

Integration

Page 3: Strategies for All Your Data Session id: 40236 Sandeepan Banerjee Vishu Krishnamurthy Oracle Corporation

Too much information in too many places

Relational Relational

Multimedia Multimedia

DocumentsDocuments

Specialized …Specialized …

MessagesMessages

LocationLocation

Specialty Servers For Different Kinds Of Data

Data Isolation High Systems Admin

And Management Costs Scalability Problems High Training Costs Complex Support

Problems

XMLXML

Page 4: Strategies for All Your Data Session id: 40236 Sandeepan Banerjee Vishu Krishnamurthy Oracle Corporation

One Management System for All Your Data

Complete Integrated Robust Scalable Secure Available on all

platforms

Oracle interMediaOracle interMediaMultimedia managementMultimedia management

Oracle Locator & SpatialOracle Locator & SpatialLocation and Proximity SearchingLocation and Proximity Searching

Extensibility FrameworkExtensibility FrameworkChemical, Genetic, Engineering,…Chemical, Genetic, Engineering,…

XML DBXML DBIntegrated Native XML DatabaseIntegrated Native XML Database

Oracle Text & Ultra SearchOracle Text & Ultra SearchText management and searchText management and search

RelationalRelationalCharacters, Numbers and DatesCharacters, Numbers and Dates

Oracle Collaboration SuiteOracle Collaboration SuiteUnified Messaging and FilesUnified Messaging and Files

Page 5: Strategies for All Your Data Session id: 40236 Sandeepan Banerjee Vishu Krishnamurthy Oracle Corporation

What is Oracle XML DB?

Database support for the XML data model– XMLType, XMLSchema, DOM Fidelity, Xpath, …

Hierarchical organization of the data– WebDAV compliant with indexing for fast access

Transparent storage optimizations Query Language: SQLX and XQuery

Page 6: Strategies for All Your Data Session id: 40236 Sandeepan Banerjee Vishu Krishnamurthy Oracle Corporation

Classes of XML DB Applications

Exchanging Structured Documents– Well-formed templated business-documents e.g.

Purchase Orders, Phone Bills, …

Managing Unstructured Documents– Documents, Messages, Instructions

Integrating and normalizing data from diverse sources

Page 7: Strategies for All Your Data Session id: 40236 Sandeepan Banerjee Vishu Krishnamurthy Oracle Corporation

Relational storage remains the “right” way to store highly structured data

As an XML programmer, you do not want to think about “tables”

– A hierarchical data model is what you want to manipulate

XML DB’s XMLType is about preserving the XML paradigm while getting the benefits of relational performance and scalability

Structured Document Exchange

Page 8: Strategies for All Your Data Session id: 40236 Sandeepan Banerjee Vishu Krishnamurthy Oracle Corporation

XML data model and API’s familiar to XML programmers

– XML Schema, Schema Validation, Dom Fidelity– JNDI, DOM, XPATH, SQLX, XQuery

Enterprise Class Performance & Scalability– Piecewise updates– Schema caching– Lazy materialization– Server-based XSL transformations

Structured Document Exchangewith Oracle XML DB

Page 9: Strategies for All Your Data Session id: 40236 Sandeepan Banerjee Vishu Krishnamurthy Oracle Corporation

Structured Data: Temenos

GLOBUS Banking platform: #1 selling platform, major banks worldwide

Contract-based system, deeply nested data model, user-customizable

80+ major subsystems, 6000 Tables, 100s of GB

“Using Oracle XML DB, we successfully benchmarked 22 million banking transactions per day, which translated to 2500 database-transactions-per-second, for Temenos' GLOBUS banking platform. Oracle XML DB’s performance assured us that powerful XML innovations can be operationalized and deployed without sacrificing enterprise-class scalability.”

- TEMENOS

Page 10: Strategies for All Your Data Session id: 40236 Sandeepan Banerjee Vishu Krishnamurthy Oracle Corporation

Managing Unstructured Data

More and more content is being produced as XML (Microsoft Word, Corel XMetal, Arbortext Epic, …)

– Markup improves search, processing, organization, … XML DB’s Repository enables XML document

content to be stored as ‘files’ in ‘folders’ without losing strong-management, queryability, unbreakable security etc.

XML is doing for unstructured data what Relational did for structured: create a standard way to store, query and manage unstructured data

Page 11: Strategies for All Your Data Session id: 40236 Sandeepan Banerjee Vishu Krishnamurthy Oracle Corporation

XML data model and API’s familiar to Content Developers

Integrated Repository– WebDAV compliant– Xpath index for fast traversal of foldering hierarchies– SQL Queryable

Integrated Text Processing– Optimizations such as “tag aware” search

Managing Unstructured Data with Oracle XML DB

Page 12: Strategies for All Your Data Session id: 40236 Sandeepan Banerjee Vishu Krishnamurthy Oracle Corporation

Reed Elsevier

Large technical publishing conglomerate More than 1700 scientific, technical & medical

peer-reviewed journals Over 59 million abstracts Over two million full-text scientific journal

articles , another one million full-text articles via CrossRef (http://www.crossref.org/) to other publishers' platforms

XML DB chosen as Repository Database

Page 13: Strategies for All Your Data Session id: 40236 Sandeepan Banerjee Vishu Krishnamurthy Oracle Corporation

Broad Performance Improvements – SQLX query rewrites– XSLT optimizations– Repository Access and Query optimizations– Direct loader support, loading large XML documents – Storage optimizations

I18N: support for differing character sets on client and server

Schema Evolution– Transparently achieves data load/reload

Unified XML API between XDK and XML DB– Unified C interfaces

10g : What’s new in XML DB

Page 14: Strategies for All Your Data Session id: 40236 Sandeepan Banerjee Vishu Krishnamurthy Oracle Corporation

iAS J2EETM Platform

XML-based Integration: XQuery Why XQuery ?

– Declarative way to query XML documents

Why Java?– Run in mid-tier or database– Future server implementation in C

Why XML Database ?– Native XML storage– XML data management – Performance optimizations– SQL/XML or XQuery depending on

data Status

– OTN downloads (pending W3C standard finalization in ’04)

XML DB

XQuery Engine Server JVM

XQuery Engine

Page 15: Strategies for All Your Data Session id: 40236 Sandeepan Banerjee Vishu Krishnamurthy Oracle Corporation

XQuery Example

Assume a document – emp.xml <empset> <emp empno=“21” ename=“SCOTT” salary=“120000”/>

<emp empno=“22” ename=“JONES” salary=“344000”/> </empset>

To get the names of employees with salary > 200000for $i in document(‘emp.xml’)/empsetlet $j = 200000where $i/@salary > $jreturn $i/@ename

Result (attribute node) JONES

Page 16: Strategies for All Your Data Session id: 40236 Sandeepan Banerjee Vishu Krishnamurthy Oracle Corporation

Differences from SQL Navigation-oriented (using XPath expressions) Different type system (XMLSchema based simple

types) Identity-based (XML Node identities and document

order) Namespace aware name-resolution (functions,

variables, element creation) Row based versus Item based Results are heterogeneous sequences Does not have all SQL extensions (e.g, OLAP, Full-

Text..)

Page 17: Strategies for All Your Data Session id: 40236 Sandeepan Banerjee Vishu Krishnamurthy Oracle Corporation

Oracle XQuery API

JXQI – Java API (ongoing standards discussions)import oracle.xquery;

XQueryContext ctx = new XQuerycontext();Reader strm = new FileReader(“exmpl1.xml”)

XQueryPreparedStatement xq = ctx.prepareStatement(strm);

XQueryResultSet rset = xq.executeQuery();

while (rset.next()) rset.getNode().print(System.out);

XQLPlus tool! (like SQLPlus)

Page 18: Strategies for All Your Data Session id: 40236 Sandeepan Banerjee Vishu Krishnamurthy Oracle Corporation

Datasources

Enables arbitrary input sources– files, cache, JCA datasources

xmldatasrc – Oracle language addition Datasource API

– initialize– describe– execute – Fetch

Bind (an existing DOM)

Page 19: Strategies for All Your Data Session id: 40236 Sandeepan Banerjee Vishu Krishnamurthy Oracle Corporation

Rewrite to SQL

XQuery over Oracle databases – Rewrite!

for $i in view(“scott.emp”)/ROW

where $i/SALARY > 200000

return $i/ENAME

-- is translated to ---

select “$i”.ename

from scott.emp “$i”

where “$i”.salary > 200000;

Page 20: Strategies for All Your Data Session id: 40236 Sandeepan Banerjee Vishu Krishnamurthy Oracle Corporation

More SQL rewrite

for $i in view(‘purchaseOrder’)/ROW/PurchaseOrderwhere $i/ShipAddr/City = ‘San Francisco’return <PO ponum=$i/@Poid> <$i/ShipAddr> </PO>

select xmlelement(“PO”, XMLAttributes(extractvalue(“$i”,‘/PurchaseOrder/@Poid’) as “ponum”)), extract(“$i”, ‘/PurchaseOrder/ShipAddr’))from scott.purchaseorder “$i”where extractvalue(“$i”, ‘/PurchaseOrder/ShipAddr/City’) = ‘San Francisco’

Page 21: Strategies for All Your Data Session id: 40236 Sandeepan Banerjee Vishu Krishnamurthy Oracle Corporation

D E M O N S T R A T I O N

XQuery

Page 22: Strategies for All Your Data Session id: 40236 Sandeepan Banerjee Vishu Krishnamurthy Oracle Corporation

Oracle Text

Rich Full-Text Capabilities built into the Oracle database

Integrated Search support for Applications– OCS, Portal, Ebusiness Suite

Catalog Search Document Archives and Warehouses Infrastructure for Intranet and Extranet Search

(via Ultra Search.)

Page 23: Strategies for All Your Data Session id: 40236 Sandeepan Banerjee Vishu Krishnamurthy Oracle Corporation

Oracle Text: Rich Full-Text

Page 24: Strategies for All Your Data Session id: 40236 Sandeepan Banerjee Vishu Krishnamurthy Oracle Corporation

10g : What’s new in Oracle Text? Supervised Classification – Rule-based and SVM Unsupervised Classification (Clustering) – KMeans and Hierarchical Query-Log Analysis Query-Templating for Progressive-Relaxation, Query-rewriting,

Alternative scoring etc. Index creation improvements -- Real-time synchronization Better Partitioning: Create local-partitioned indexes in parallel Filtering enhancements

– Filter and index RFC-822 email messages Language Enhancements

– Japanese stemming, Customization of Japanese & Chinese Lexicons Information Visualization – Stretch viewer

Page 25: Strategies for All Your Data Session id: 40236 Sandeepan Banerjee Vishu Krishnamurthy Oracle Corporation

Oracle Ultra Search

Out-of-the-box heterogeneous search-and-locate capabilities– DB, Web Servers, Files, E-Mail, Apps

High performance threaded Java crawlers Web-style interface Extensible, customizable (Java API)

– Customizable metadata search– Custom crawling– Custom rendering

Integrated administration Fully multilingual and globalized Integrated with Oracle Portal (repository, portlet) and Oracle

Collaboration Suite

Page 26: Strategies for All Your Data Session id: 40236 Sandeepan Banerjee Vishu Krishnamurthy Oracle Corporation
Page 27: Strategies for All Your Data Session id: 40236 Sandeepan Banerjee Vishu Krishnamurthy Oracle Corporation

10g: What’s new in Ultra Search?

Enhanced Security– Secure Crawling (https support)– Better Authentication

http Digest and Forms– ACL-secured search hitlist

Role-based ACLs per datasource Or custom ACLs stamped by crawler

Federated Search– JCA-compliant Searchlet API

Unified Search– Secure Crawler API

OID Integration

Page 28: Strategies for All Your Data Session id: 40236 Sandeepan Banerjee Vishu Krishnamurthy Oracle Corporation

D E M O N S T R A T I O N

Information Visualization

Page 29: Strategies for All Your Data Session id: 40236 Sandeepan Banerjee Vishu Krishnamurthy Oracle Corporation

The Media-enabled Oracle Platform Oracle Database 10g

– Storage, management, & retrieval of image, audio, video data

– Native format understanding, metadata extraction, methods for

image processing

– Support for leading streaming media servers

Oracle Application Server 10g

– JSP, servlet and PL/SQL application development support

– Media Adaptation Services for Wireless

– JDeveloper (BC4J/UIX) and Portal integration

Oracle Collaboration Suite– Metadata extraction for OCS Files

Page 30: Strategies for All Your Data Session id: 40236 Sandeepan Banerjee Vishu Krishnamurthy Oracle Corporation

New Oracle10g Multimedia Features

Standards Support – SQL/MM Still Image

New version of Java Advanced Imaging (JAI 1.1.1_01)

and additional image processing operators

Support for additional media formats

– Microsoft ASF, MPEG2 & MPEG4

• Microsoft Windows Media Server Plugin

• Real Server Plugin for Helix Server

• XML DB integration

Page 31: Strategies for All Your Data Session id: 40236 Sandeepan Banerjee Vishu Krishnamurthy Oracle Corporation

How Oracle’s Multimedia capabilites are better

Only Oracle10g: Supports media content natively

– No manual initiation of separate processes to enable database tablespace to accept media data.

– No need for DBAs to initiate these processes for each table where they wish to store media data

Stores all media and its metadata in the same table as the associated relational data

– No triggers on each and every media object created to update the separate “administration” tables that contain media objects and metadata.

– No added processing and I/O overhead for access and retrieval Provides Java class libraries and JSP Tag libraries for

application development and media access.

Page 32: Strategies for All Your Data Session id: 40236 Sandeepan Banerjee Vishu Krishnamurthy Oracle Corporation

Oracle is the Leading Spatial Database

“In repeated surveys, IDC has found that Oracle is used in an 80%-90% share of Spatial Information Management oriented database installations.”

IDC, December 2002 Oracle 10g Locator feature: Beginning with Oracle9i

LOCATION capabilities have been part of EVERY database at NO ADDITIONAL COST

– Enables business, web and LBS applications Oracle Spatial 10g: Enterprise Edition Option

– Supports advanced Land Management, GIS, Transportation,Energy / Utilities, Remote Sensing, Defense and Intelligence applications

Page 33: Strategies for All Your Data Session id: 40236 Sandeepan Banerjee Vishu Krishnamurthy Oracle Corporation

Oracle10g Location FeaturesLocator Points, lines, polygons 2D, 3D, 4D data Spatial Operators

– Distance– Relationships

Coordinate Systems Long Transactions Table Partitioning* Object Replication** Parallel Query* – NEW!Parallel Query* – NEW! Deferred Spatial Indexes – Deferred Spatial Indexes –

NEW!NEW!

Spatial (Enterprise Option) All Locator features Spatial functions

– area/length calculation– buffer, centroid, intersection,

union, etc. Linear Referencing Spatial Aggregates Coordinate Transforms GeoRaster – NEW!GeoRaster – NEW! Topology Data Model – NEW!Topology Data Model – NEW! Network Data Model – NEW!Network Data Model – NEW! GeoCoder – NEW!GeoCoder – NEW! Spatial Data Analysis & Mining Spatial Data Analysis & Mining

– NEW!– NEW!

* Requires Enterprise Edition with Partitioning Option** Some replication features on Enterprise Ed. only

Page 34: Strategies for All Your Data Session id: 40236 Sandeepan Banerjee Vishu Krishnamurthy Oracle Corporation

Location features in the Oracle “Stack”

Oracle Database 10g

Spatial

Any device

Data Server

Oracle Application Server 10g

Oracle Location Technology Oracle core technologies

iAS LBS Components

Application Server

CRM & ERP Applications

TCA schema

e-Business Suite

Locator Online Service

B2B, B2E, B2C

SOAP, WSDL

Web Services

iAS MapViewer / JDeveloper

Page 35: Strategies for All Your Data Session id: 40236 Sandeepan Banerjee Vishu Krishnamurthy Oracle Corporation

Open API to plug in new data types and access methods

Specialty Data Types Chemical Genetic Engineering Biometric Multimedia

Driven by specialized-domain ISVs --MDL, NetGene, Informax, Protegrity, …

Oracle’s Extensibility Framework

Page 36: Strategies for All Your Data Session id: 40236 Sandeepan Banerjee Vishu Krishnamurthy Oracle Corporation

Extensibility: In Silico Chemistry Chemistry searching requires special

techniques– Chemical name is not unique

“Viagra®”

The solution:

– A graphical search engine

– Specialized operators such as substructure search (“sss”) = a chemical “contains”

“sildenafil citrate”

N

N

SO O

O

N

NN

N

O

H

H H

HHH

H

H

H

– Chemists think graphically

Page 37: Strategies for All Your Data Session id: 40236 Sandeepan Banerjee Vishu Krishnamurthy Oracle Corporation

Oracle Collaboration Suite

Consolidate management of unstructured data (email, shared documents and other collaborative content)

Before grid computing, resources such as storage and CPUs had to be managed separately for each component of the suite (e.g. email vs files vs web conferencing).

OCS 10g takes advantage of grid infrastructure for greater efficiency, reduced cost and easier management

Page 38: Strategies for All Your Data Session id: 40236 Sandeepan Banerjee Vishu Krishnamurthy Oracle Corporation

Extended Data Management

• Structured data will stay Relational• Documents & Messages will move to XML• Multimedia will be in BLOBs, with metadata annotated in XML

Oracle provides the most robust open and extensible platform and the important services for all your data

• Storage and Management• Search, Interchange, Visualization• Analytics and Mining

Ultra Search crawls and (where desirable) federates non-Oracle or legacy sources, and bring these in the ambit of uniform access

• Search, Interchange, Visualization• Analytics and Mining

Oracle Collaboration Suite, Oracle Portal, eBusiness Suite provide solutions

Page 39: Strategies for All Your Data Session id: 40236 Sandeepan Banerjee Vishu Krishnamurthy Oracle Corporation

AQ&Q U E S T I O N SQ U E S T I O N S

A N S W E R SA N S W E R S

Page 40: Strategies for All Your Data Session id: 40236 Sandeepan Banerjee Vishu Krishnamurthy Oracle Corporation