1695 Ovum Big Data Informatica

Embed Size (px)

Citation preview

  • 7/29/2019 1695 Ovum Big Data Informatica

    1/5

    Big Data integration is the big deal in Informatica 9.1 (OI00141-026)

    Ovum (Published 06/2011) Page 1

    This report is a licensed product and is not to be photocopied

    OVUM OPINION

    Big Data integration is the big

    deal in Informatica 9.1Reference Code: OI00141-026

    Publication Date: June 2011

    Author: Madan Sheina and Tony Baer

    OVUM VIEW

    Summary

    The highlights of Informatica's recent 9.1 platform release target Big Data integration, self-service,

    upgraded data quality, master data management (MDM), and data service capabilities. It provides

    solid functional updates to what is already a rich and ever-broadening data integration platform.

    The Informatica platform already supported data movements with Hadoop through partnershipswith Cloudera and EMC, but the new release adds direct, bidirectional connectivity between

    Informatica and Hadoop, tapping an emergent use case for customers seeking the raw power of

    this NoSQL target. The 9.1 release also adds new connectors to social networks, supporting the

    increasingly popular use case of social media analytics.

    Big Data challenges play directly into Informatica's integration strengths

    Big Data represents the confluence of more and new/emerging types of transaction and interaction

    data with demands for more scalable and quicker processing of that data. The issue is not so

    much the size of these traditionally slioed repositories of information, but the potential for

    understanding the relationships between them.

    This is where Informatica's competencies come into play. Combining traditional structured

    transactional information with unstructured interaction data generated by humans and the Internet

    (customer records, social media) and, increasingly, machines (sensor data, call detail records) is

    clearly the sweet spot. These types of interaction data have traditionally been difficult to access or

    process using conventional BI systems. The appeal of adding these new data types is to allow

    enterprises to achieve a more complete view of customers, with new insights into relationships and

  • 7/29/2019 1695 Ovum Big Data Informatica

    2/5

    Big Data integration is the big deal in Informatica 9.1 (OI00141-026)

    Ovum (Published 06/2011) Page 2

    This report is a licensed product and is not to be photocopied

    behaviors from social media data. That, of course, presents Informatica with an opportunity to

    apply its data integration, profiling, and quality know-how directly to Big Data sets and processing

    environments to enrich data sets as well as master data.

    Not surprisingly, Informatica is calling Big Data the next big growth opportunity for its business,

    with 9.1 the first stab of many. However, Ovum believes focus on Big Data is a natural corollary to

    the company's last stated big growth opportunity the Informatica cloud as both a data source

    target and a platform on which to host its products. A big part of Big Data will be driven by

    enterprises seeking to build hybrid architectures that store and integrate data residing in on-

    premise systems and in the cloud.

    Informatica PowerExchange provides the technical foundation for 9.1's

    Big Data play

    Informatica supports Big Data in two ways backing both Hadoop and non-Hadoop processing

    platforms and it is doing so largely through its PowerExchange family of data access products.

    In May 2011 the company announced support for EMC Greenplum's distribution of the Hadoop file

    system. The 9.1 release builds on this by adding a new PowerExchange for the Hadoop

    Distributed File System (HDFS) connectivity tool, which augments Big Data processing by moving

    enterprise data into Hadoop clustered environments for highly scalable parallel processing and out

    to targets (such as data warehouses) for consumption and analysis. The benefit is being able to

    reuse existing Informatica development skills in Hadoop environments. This addresses a major

    gap identified in the Ovum report What isBig Data: The Big Architecture: the lack of skills for

    Hadoop, MapReduce, and related technologies is currently one of the biggest impediments to

    adoption of NoSQL platforms.

    In the next release, Informatica plans to build a more robust offering that includes a graphical

    integrated development environment (IDE) for Hadoop; codeless and metadata-driven

    development; the ability to prepare and integrate data directly inside Hadoop environments; and

    end-to-end metadata lineage across the Informatica, Hadoop, and target environments.

    The 9.1 platform also includes a new set of connectors to various Big Data transactional systems

    to make it easier to meld structured transactional with largely unstructured interaction data

    (including social media). Informatica already offers connectors to popular databases such as

    Oracle, DB2, Teradata, and IBM Netezza, and is planning to put purpose-built advanced SQL

    analytic databases onto its price list, including Teradata/Aster Data, EMC Greenplum, and HP

    Vertica. Informatica has taken the logical first step in supporting social network integration by

    adding connectors for published Twitter, LinkedIn, and Facebook APIs.

  • 7/29/2019 1695 Ovum Big Data Informatica

    3/5

    Big Data integration is the big deal in Informatica 9.1 (OI00141-026)

    Ovum (Published 06/2011) Page 3

    This report is a licensed product and is not to be photocopied

    Informatica has also enhanced its B2B Data Exchange Transformation product to make it easier to

    connect to other interaction data gleaned from call detail records (CDR), device/sensor data and

    scientific data (genomic and pharmaceutical), and large image files (through managed file

    transfer). Although the initial set of social media adapters are prescriptive to certain sites, Ovum

    expects Informatica to eventually offer a software development kit (SDK) approach that provides

    flexible connectivity to broader social media data sources.

    Informatica is not alone in providing support for loading and accessing of data to and from Hadoop.

    The race is on to provide a standardized set of visual Hadoop-focused tools that build around

    pillars such as MapReduce and access and transformation languages such as Hive and Pig. The

    leader will be the one that makes the NoSQL environment comfortable enough for the SQLdeveloper mainstream.

    MDM gets tightened integration with the rest of the platform

    As one of its more recent and watershed acquisitions, one of Informatica's biggest challenges

    for this release was tighter integration with the MDM technology that came from Siperian. This

    helps organizations to deliver "authoritative and trustworthy data." Informatica's first move in the

    9.0 release was to allow customers to define data quality rules that could be applied to data

    integration. The 9.1 release further advances integration across the platform by allowing end users

    to reuse the same data quality rules in the MDM environment. Hence, data quality policies can be

    surfaced and reused across data profiling, data cleansing, and MDM as a single process. The key

    benefits are better governance (which avoids having conflicting data quality rules applied across

    systems) and safeguarding existing investments in data quality rule standardization and skills

    (allowing them to be retained and transferred over to the MDM environment).

    Siperian provided a comprehensive multi-data domain solution (customer, product, chart of

    accounts, location, etc.), but architecturally it was rigid. That has changed in 9.1, which supports

    multiple MDM deployment styles registry, single-instance/consolidated hub, coexistence,

    analytical, transactional or federated via cloud, or service-oriented architecture. Further flexibility is

    enabled through added features that prevent the duplicate master data types from being created,

    make master data entity hierarchies and relationships more visible (within the Data Director tool),

    and enhance registry services for quicker on-boarding and updating of metadata (primarily through

    messaging) and more targeted master data search techniques.

    9.1 encourages users to be self-sufficient

    This release also comes with a long list of functional upgrades across the staple tools of the

    Informatica suite, such as data quality, data profiling, federation and virtualization, application ILM,

    event processing, and low-latency messaging. There are simply too many to do each one justice in

  • 7/29/2019 1695 Ovum Big Data Informatica

    4/5

    Big Data integration is the big deal in Informatica 9.1 (OI00141-026)

    Ovum (Published 06/2011) Page 4

    This report is a licensed product and is not to be photocopied

    this research note. However, one common thread that stands out across many of these additional

    enhancements in 9.1 is a continued focus on self-service provisioning of (in Informatica parlance)

    "authoritative and trustworthy" data.

    Informatica has worked hard to make its core business more accessible to a broader, non-

    technical IT audience. This is a challenge, as data integration is a complicated IT task that has

    traditionally been the almost exclusive preserve of skilled DBAs and developers.

    Notable functionality to support this accessibility initiative includes the introduction of so-called

    "proactive data quality assurance" services to identify data exceptions more quickly. This is based

    on a complex event processing (CEP)-like model, which allows ETL developers to provide

    comparative profiling analysis to map certain data quality rules and logic against data profiles at

    early stages of the transformation pipeline in order to prevent costly errors from surfacing

    downstream. The model works by dynamically generating and comparing profiles of data as it

    flows through the mapping pipeline. It also enables "top-down" validation of actual versus expected

    data in data integration projects which is particularly useful when upgrading applications.

    There is also a new interactive, self-service Data Integration Analyst workbench for data analysts

    and data stewards, which extends a similar capability introduced for data quality analysts in its 9.0

    release. This workbench aims to empower non-technical users who are close to the business and

    arguably have better business understanding of data to define their own data integration mapping

    and routines without having to constantly toggle back to IT developers.

    The creation and validation of source-to-target mappings is handled through a browser-based,

    guided interface that enables business analysts and data stewards to pinpoint data using business

    terms, define source-to-target mappings, selectively apply transform rules (including ETL and data

    quality) from a predefined inventory, validate the rules on the fly, and preview the results of their

    specifications. For example, analysts can find and navigate data sources and targets using

    metadata such as a business glossary or data lineage trails; specify, save, and share their own

    transformation logic with other analysts, projects, or both; and embed existing ETL mapping logic

    and data quality rules into their specification. The Data Integration Analyst tool then automatically

    generates the relevant PowerCenter or Informatica Data Services (IDS) transformation mapping

    logic, which can be deployed as virtualized SQL views, published web services, or batch ETLroutines.

    9.1 adds greater project awareness to data virtualization

    Another notable addition to 9.1 is so-called adaptive data services that wrap project-specific

    context and intelligence into the data federation creation and delivery process. This allows delivery

    of data from single sources to the business needs of all projects, without necessarily having to

    reinvent the wheel for every project and ensure consistency.

  • 7/29/2019 1695 Ovum Big Data Informatica

    5/5

    Big Data integration is the big deal in Informatica 9.1 (OI00141-026)

    Ovum (Published 06/2011) Page 5

    This report is a licensed product and is not to be photocopied

    Informatica leverages this data virtualization solution as part of the overall platform to enable

    physical and virtual data integration depending on business needs. Informatica call this "multi-

    protocol data provisioning." It is technically an extension of Informatica's core data services

    architecture, and uses SQL endpoints via ODBC or JDBC as a web service, or to PowerCenter as

    a batch process. The key benefit is governance since the multi-provisioning is based on a common

    logical data object and policy definitions.

    APPENDIX

    Disclaimer

    All Rights Reserved.

    No part of this publication may be reproduced, stored in a retrieval system or transmitted in any

    form by any means, electronic, mechanical, photocopying, recording or otherwise, without the prior

    permission of the publisher, Ovum (a subsidiary company of Datamonitor plc).

    The facts of this report are believed to be correct at the time of publication but cannot be

    guaranteed. Please note that the findings, conclusions and recommendations that Ovum delivers

    will be based on information gathered in good faith from both primary and secondary sources,

    whose accuracy we are not always in a position to guarantee. As such Ovum can accept no

    liability whatever for actions taken based on any information that may subsequently prove to be

    incorrect.