76
UNIT 6 RECENT TRENDS

UNIT 6 RECENT TRENDS. Hadoop Related Subprojects Hive SQL-like Query language and Metastore Pig High-level language for data analysis HBase

Embed Size (px)

Citation preview

Page 1: UNIT 6 RECENT TRENDS. Hadoop Related Subprojects  Hive  SQL-like Query language and Metastore  Pig  High-level language for data analysis  HBase

UNIT 6RECENT TRENDS

Page 2: UNIT 6 RECENT TRENDS. Hadoop Related Subprojects  Hive  SQL-like Query language and Metastore  Pig  High-level language for data analysis  HBase

Hadoop Related Subprojects Hive

SQL-like Query language and Metastore

Pig High-level language for data analysis

HBase Table storage for semi-structured data

Zookeeper Coordinating distributed applications

Mahout Machine learning

Page 3: UNIT 6 RECENT TRENDS. Hadoop Related Subprojects  Hive  SQL-like Query language and Metastore  Pig  High-level language for data analysis  HBase
Page 4: UNIT 6 RECENT TRENDS. Hadoop Related Subprojects  Hive  SQL-like Query language and Metastore  Pig  High-level language for data analysis  HBase

Making of Hive

Page 5: UNIT 6 RECENT TRENDS. Hadoop Related Subprojects  Hive  SQL-like Query language and Metastore  Pig  High-level language for data analysis  HBase
Page 6: UNIT 6 RECENT TRENDS. Hadoop Related Subprojects  Hive  SQL-like Query language and Metastore  Pig  High-level language for data analysis  HBase
Page 7: UNIT 6 RECENT TRENDS. Hadoop Related Subprojects  Hive  SQL-like Query language and Metastore  Pig  High-level language for data analysis  HBase
Page 8: UNIT 6 RECENT TRENDS. Hadoop Related Subprojects  Hive  SQL-like Query language and Metastore  Pig  High-level language for data analysis  HBase
Page 9: UNIT 6 RECENT TRENDS. Hadoop Related Subprojects  Hive  SQL-like Query language and Metastore  Pig  High-level language for data analysis  HBase
Page 10: UNIT 6 RECENT TRENDS. Hadoop Related Subprojects  Hive  SQL-like Query language and Metastore  Pig  High-level language for data analysis  HBase

Whose idea was this?

Page 11: UNIT 6 RECENT TRENDS. Hadoop Related Subprojects  Hive  SQL-like Query language and Metastore  Pig  High-level language for data analysis  HBase
Page 12: UNIT 6 RECENT TRENDS. Hadoop Related Subprojects  Hive  SQL-like Query language and Metastore  Pig  High-level language for data analysis  HBase
Page 13: UNIT 6 RECENT TRENDS. Hadoop Related Subprojects  Hive  SQL-like Query language and Metastore  Pig  High-level language for data analysis  HBase
Page 14: UNIT 6 RECENT TRENDS. Hadoop Related Subprojects  Hive  SQL-like Query language and Metastore  Pig  High-level language for data analysis  HBase
Page 15: UNIT 6 RECENT TRENDS. Hadoop Related Subprojects  Hive  SQL-like Query language and Metastore  Pig  High-level language for data analysis  HBase
Page 16: UNIT 6 RECENT TRENDS. Hadoop Related Subprojects  Hive  SQL-like Query language and Metastore  Pig  High-level language for data analysis  HBase
Page 17: UNIT 6 RECENT TRENDS. Hadoop Related Subprojects  Hive  SQL-like Query language and Metastore  Pig  High-level language for data analysis  HBase
Page 18: UNIT 6 RECENT TRENDS. Hadoop Related Subprojects  Hive  SQL-like Query language and Metastore  Pig  High-level language for data analysis  HBase
Page 19: UNIT 6 RECENT TRENDS. Hadoop Related Subprojects  Hive  SQL-like Query language and Metastore  Pig  High-level language for data analysis  HBase
Page 20: UNIT 6 RECENT TRENDS. Hadoop Related Subprojects  Hive  SQL-like Query language and Metastore  Pig  High-level language for data analysis  HBase
Page 21: UNIT 6 RECENT TRENDS. Hadoop Related Subprojects  Hive  SQL-like Query language and Metastore  Pig  High-level language for data analysis  HBase

Creating a Hive Table

Partitioning breaks table into separate files for each (dt, country) pairEx: /hive/page_view/dt=2008-06-08,country=USA

/hive/page_view/dt=2008-06-08,country=CA

CREATE TABLE page_views(viewTime INT, userid BIGINT, page_url STRING, referrer_url STRING, ip STRING COMMENT 'User IP address') COMMENT 'This is the page view table' PARTITIONED BY(dt STRING, country STRING)STORED AS SEQUENCEFILE;

Page 22: UNIT 6 RECENT TRENDS. Hadoop Related Subprojects  Hive  SQL-like Query language and Metastore  Pig  High-level language for data analysis  HBase

A Simple Query

SELECT page_views.* FROM page_views WHERE page_views.date >= '2008-03-01'AND page_views.date <= '2008-03-31'AND page_views.referrer_url like '%xyz.com';

• Hive only reads partition 2008-03-01,* instead of scanning entire table

• Find all page views coming from xyz.com on March 31st: (month of March)

Page 23: UNIT 6 RECENT TRENDS. Hadoop Related Subprojects  Hive  SQL-like Query language and Metastore  Pig  High-level language for data analysis  HBase

Pig

Started at Yahoo! Research Now runs about 30% of Yahoo!’s jobs Features

Expresses sequences of MapReduce jobs Data model: nested “bags” of items Provides relational (SQL) operators (JOIN, GROUP BY, etc.) Easy to plug in Java functions

Page 24: UNIT 6 RECENT TRENDS. Hadoop Related Subprojects  Hive  SQL-like Query language and Metastore  Pig  High-level language for data analysis  HBase

An Example Problem

Suppose you have user data in a file, website data in another, and you need to find the top 5 most visited pages by users aged 18-25

Load Users

Load Pages

Filter by age

Join on name

Group on url

Count clicks

Order by clicks

Take top 5

Page 25: UNIT 6 RECENT TRENDS. Hadoop Related Subprojects  Hive  SQL-like Query language and Metastore  Pig  High-level language for data analysis  HBase

In Pig Latin

Users = load ‘users’ as (name, age);Filtered = filter Users by age >= 18 and age <= 25;

Pages = load ‘pages’ as (user, url);Joined = join Filtered by name, Pages by user;

Grouped = group Joined by url;Summed = foreach Grouped generate group, count(Joined) as clicks;

Sorted = order Summed by clicks desc;Top5 = limit Sorted 5;store Top5 into ‘top5sites’;

Page 26: UNIT 6 RECENT TRENDS. Hadoop Related Subprojects  Hive  SQL-like Query language and Metastore  Pig  High-level language for data analysis  HBase

Ease of TranslationLoad Users

Load Pages

Filter by age

Join on name

Group on url

Count clicks

Order by clicks

Take top 5

Users = load …Fltrd = filter … Pages = load …Joined = join …Grouped = group …Summed = … count()…Sorted = order …Top5 = limit …

Page 27: UNIT 6 RECENT TRENDS. Hadoop Related Subprojects  Hive  SQL-like Query language and Metastore  Pig  High-level language for data analysis  HBase

Ease of Translation

Load Users

Load Pages

Filter by age

Join on name

Group on url

Count clicks

Order by clicks

Take top 5

Users = load …Fltrd = filter … Pages = load …Joined = join …Grouped = group …Summed = … count()…Sorted = order …Top5 = limit …

Job 1

Job 2

Job 3

Page 28: UNIT 6 RECENT TRENDS. Hadoop Related Subprojects  Hive  SQL-like Query language and Metastore  Pig  High-level language for data analysis  HBase

Refer Netezza slides

Page 29: UNIT 6 RECENT TRENDS. Hadoop Related Subprojects  Hive  SQL-like Query language and Metastore  Pig  High-level language for data analysis  HBase

Teradata

Teradata Corporation is a publicly-held international computer company that sells analytic data platforms, marketing applications and related services.

Its analytics products are meant to consolidate data from different sources and make the data available for analysis.

Teradata marketing applications are meant to support marketing teams that use data analytics to inform and develop programs.

Teradata is an enterprise software company that develops and sells a relational database management system (RDBMS) with the same name. Teradata is publicly traded on the New York Stock Exchange (NYSE) under the stock symbol TDC.

Page 30: UNIT 6 RECENT TRENDS. Hadoop Related Subprojects  Hive  SQL-like Query language and Metastore  Pig  High-level language for data analysis  HBase

Teradata The Teradata product is referred to as a "data

warehouse system“ that stores and manages data.

The data warehouses use a "shared nothing" architecture, which means that each server node has its own memory and processing power. Adding more servers and nodes increases the amount of data that can be stored. The database software sits on top of the servers and spreads the workload among them.

Text analytics to track unstructured data, such as word processor documents, and semi-structured data, such as spreadsheets.

Teradata's product can be used for business analysis. Data warehouses can track company data, such as sales, customer preferences, product placement, etc.

Page 31: UNIT 6 RECENT TRENDS. Hadoop Related Subprojects  Hive  SQL-like Query language and Metastore  Pig  High-level language for data analysis  HBase

Change data capture

In databases, change data capture (CDC) is a set of software design patterns used to determine (and track) the data that has changed so that action can be taken using the changed data.

Change data capture (CDC) is an approach to data integration that is based on the identification, capture and delivery of the changes made to enterprise data sources.

CDC solutions occur most often in data-warehouse environments since capturing and preserving the state of data across time is one of the core functions of a data warehouse, but CDC can be utilized in any database or data repository system.

Page 32: UNIT 6 RECENT TRENDS. Hadoop Related Subprojects  Hive  SQL-like Query language and Metastore  Pig  High-level language for data analysis  HBase

Methodology System developers can set up CDC mechanisms in a

number of ways and in any one or a combination of system layers from application logic down to physical storage.

In a simplified CDC context, one computer system has data believed to have changed from a previous point in time, and a second computer system needs to take action based on that changed data. The former is the source, the latter is the target. It is possible that the source and target are the same system physically, but that would not change the design pattern logically.

Multiple CDC solutions can exist in a single system.

Page 33: UNIT 6 RECENT TRENDS. Hadoop Related Subprojects  Hive  SQL-like Query language and Metastore  Pig  High-level language for data analysis  HBase

Timestamps on rows Tables whose changes must be captured may

have a column that represents the time of last change. Names such as LAST_UPDATE, etc. are common.

Any row in any table that has a timestamp in that column that is more recent than the last time data was captured is considered to have changed.

Page 34: UNIT 6 RECENT TRENDS. Hadoop Related Subprojects  Hive  SQL-like Query language and Metastore  Pig  High-level language for data analysis  HBase

Version Numbers on rows Database designers give tables whose changes must be

captured a column that contains a version number. Names such as VERSION_NUMBER, etc. are common.

When data in a row changes, its version number is updated to the current version. A supporting construct such as a reference table with the current version in it is needed. When a change capture occurs, all data with the latest version number is considered to have changed. When the change capture is complete, the reference table is updated with a new version number.

Three or four major techniques exist for doing CDC with version numbers.

Page 35: UNIT 6 RECENT TRENDS. Hadoop Related Subprojects  Hive  SQL-like Query language and Metastore  Pig  High-level language for data analysis  HBase

Status indicators on rows This technique can either supplement or

complement timestamps and versioning. It can configure an alternative if, for example, a status column is set up on a table row indicating that the row has changed (e.g. a boolean column that, when set to true, indicates that the row has changed).

Page 36: UNIT 6 RECENT TRENDS. Hadoop Related Subprojects  Hive  SQL-like Query language and Metastore  Pig  High-level language for data analysis  HBase

Time/Version/Status on rows This approach combines the three previously discussed

methods. As noted, it is not uncommon to see multiple CDC solutions at work in a single system, however, the combination of time, version, and status provides a particularly powerful mechanism and programmers should utilize them as a trio where possible. The three elements are not redundant or superfluous.

Using them together allows for such logic as, "Capture all data for version 2.1 that changed between 6/1/2005 12:00 a.m. and 7/1/2005 12:00 a.m. where the status code indicates it is ready for production."

Page 37: UNIT 6 RECENT TRENDS. Hadoop Related Subprojects  Hive  SQL-like Query language and Metastore  Pig  High-level language for data analysis  HBase

Triggers on tables May include a publish/subscribe pattern to communicate the

changed data to multiple targets. In this approach, triggers log events that happen to the

transactional table into another queue table that can later be "played back".

For example, imagine an Accounts table, when transactions are taken against this table, triggers would fire that would then store a history of the event or even the deltas into a separate queue table. The queue table might have schema with the following fields: Id, TableName, RowId, TimeStamp, Operation. The data inserted for our Account sample might be: 1, Accounts, 76, 11/02/2008 12:15am, Update. More complicated designs might log the actual data that changed. This queue table could then be "played back" to replicate the data from the source system to a target.

An example of this technique is the pattern known as the log trigger.

Page 38: UNIT 6 RECENT TRENDS. Hadoop Related Subprojects  Hive  SQL-like Query language and Metastore  Pig  High-level language for data analysis  HBase

Event Programming Coding a change into an application at

appropriate points is another method that can give intelligent discernment that data changed. Although this method involves programming vs. more easily implemented "dumb" triggers, it may provide more accurate and desirable CDC, such as only after a COMMIT, or only after certain columns changed to certain values - just what the target system is looking for.

Page 39: UNIT 6 RECENT TRENDS. Hadoop Related Subprojects  Hive  SQL-like Query language and Metastore  Pig  High-level language for data analysis  HBase

Log scanners on databases Most database management systems

manage a transaction log that records changes made to the database contents and to metadata.

By scanning and interpreting the contents of the database transaction log one can capture the changes made to the database

Page 40: UNIT 6 RECENT TRENDS. Hadoop Related Subprojects  Hive  SQL-like Query language and Metastore  Pig  High-level language for data analysis  HBase

Tracking the capture Tracking the changes depends on the data

source. If the data is being persisted in a modern database then Change Data Capture is a simple matter of permissions. Two techniques are in common use:

Tracking changes using Database Triggers Reading the transaction log as, or shortly after, it

is written. If the data is not in a modern database, Change

Data Capture becomes a programming challenge.

Page 41: UNIT 6 RECENT TRENDS. Hadoop Related Subprojects  Hive  SQL-like Query language and Metastore  Pig  High-level language for data analysis  HBase

Push versus pull Push: the source process creates a snapshot

of changes within its own process and delivers rows downstream. The downstream process uses the snapshot, creates its own subset and delivers them to the next process.

Pull: the target that is immediately downstream from the source, prepares a request for data from the source. The downstream target delivers the snapshot to the next target, as in the push model.

Page 42: UNIT 6 RECENT TRENDS. Hadoop Related Subprojects  Hive  SQL-like Query language and Metastore  Pig  High-level language for data analysis  HBase

Real-time business intelligence (RTBI)

Real-time business intelligence (RTBI) is the process of delivering business intelligence (BI) or information about business operations as they occur. Real time means near to zero latency and access to information whenever it is required.

Business transactions as they occur are fed to a real-time BI system that maintains the current state of the enterprise. The RTBI system not only supports the classic strategic functions of data warehousing for deriving information and knowledge from past enterprise activity, but it also provides real-time tactical support to drive enterprise actions that react immediately to events as they occur.

It replaces both the classic data warehouse and the enterprise application integration (EAI) functions. Such event-driven processing is a basic tenet of real-time business intelligence.

Page 43: UNIT 6 RECENT TRENDS. Hadoop Related Subprojects  Hive  SQL-like Query language and Metastore  Pig  High-level language for data analysis  HBase

Traditional BI presents historical data for manual analysis, RTBI compares current business events with historical patterns to detect problems or opportunities automatically. This automated analysis capability enables corrective actions to be initiated and/or business rules to be adjusted to optimize business processes.

Up-to-a-minute data is analyzed, either directly from Operational sources or feeding business transactions into a real time data warehouse and Business Intelligence system.

Real-time business intelligence makes sense for some applications but not for others – a fact that organizations need to take into account as they consider investments in real-time BI tools. Key to deciding whether a real-time BI strategy would pay dividends is understanding the needs of the business and determining whether end users require immediate access to data for analytical purposes, or if something less than real time is fast enough.

Page 44: UNIT 6 RECENT TRENDS. Hadoop Related Subprojects  Hive  SQL-like Query language and Metastore  Pig  High-level language for data analysis  HBase

Evolution of RTBI Decisions that are based on the most

current data available to improve customer relationships, increase revenue, maximize operational efficiencies, and even save lives. This technology is real-time business intelligence. Real-time business intelligence systems provide the information necessary to strategically improve an enterprise’s processes as well as to take tactical advantage of events as they occur.

Page 45: UNIT 6 RECENT TRENDS. Hadoop Related Subprojects  Hive  SQL-like Query language and Metastore  Pig  High-level language for data analysis  HBase

Latency All real-time business intelligence systems have some latency, but the

goal is to minimize the time from the business event happening to a corrective action or notification being initiated. Analyst Richard Hackathorn describes three types of latency:

Data latency; the time taken to collect and store the data Analysis latency; the time taken to analyze the data and turn it into

actionable information Action latency; the time taken to react to the information and take

action Real-time business intelligence technologies are designed to reduce all

three latencies to as close to zero as possible, whereas traditional business intelligence only seeks to reduce data latency and does not address analysis latency or action latency since both are governed by manual processes.

The concept of right time business intelligence proposes that information should be delivered just before it is required, and not necessarily in real-time.

Page 46: UNIT 6 RECENT TRENDS. Hadoop Related Subprojects  Hive  SQL-like Query language and Metastore  Pig  High-level language for data analysis  HBase

Event-based Real-time Business Intelligence systems are event driven, and

may use Complex Event Processing, Event Stream Processing and Mashup (web application hybrid) techniques to enable events to be analysed without being first transformed and stored in a database. These in- memory techniques have the advantage that high rates of events can be monitored, and since data does not have to be written into databases data latency can be reduced to milliseconds.

Data warehouse An alternative approach to event driven architectures is to

increase the refresh cycle of an existing data warehouse to update the data more frequently. These real-time data warehouse systems can achieve near real-time update of data, where the data latency typically is in the range from minutes to hours. The analysis of the data is still usually manual, so the total latency is significantly different from event driven architectural approaches.

Page 47: UNIT 6 RECENT TRENDS. Hadoop Related Subprojects  Hive  SQL-like Query language and Metastore  Pig  High-level language for data analysis  HBase

Server-less technology The latest alternative innovation to "real-time"

event driven and/or "real-time" data warehouse architectures is MSSO Technology (Multiple Source Simple Output) which removes the need for the data warehouse and intermediary servers altogether since it is able to access live data directly from the source (even from multiple, disparate sources). Because live data is accessed directly by server-less means, it provides the potential for zero-latency, real-time data in the truest sense.

Page 48: UNIT 6 RECENT TRENDS. Hadoop Related Subprojects  Hive  SQL-like Query language and Metastore  Pig  High-level language for data analysis  HBase

Process-aware This is sometimes considered a subset of Operational intelligence

and is also identified with Business Activity Monitoring. It allows entire processes (transactions, steps) to be monitored, metrics (latency, completion/failed ratios, etc.) to be viewed, compared with warehoused historic data, and trended in real-time. Advanced implementations allow threshold detection, alerting and providing feedback to the process execution systems themselves, thereby 'closing the loop'.

Technologies that support real-time analytics Technologies that can be supported to enable real-time business

intelligence are data visualization, data federation, enterprise information integration, enterprise application integration and service oriented architecture. Complex event processing tools can be used to analyze data streams in real time and either trigger automated actions or alert workers to patterns and trends.

Page 49: UNIT 6 RECENT TRENDS. Hadoop Related Subprojects  Hive  SQL-like Query language and Metastore  Pig  High-level language for data analysis  HBase

Data warehouse appliance : Data warehouse appliance is a combination of hardware and software product which was designed exclusively for analytical processing.

In data warehouse implementation, tasks that involve tuning, adding or editing structure around the data, data migration from other databases, reconciliation of data are done by DBA.

Another task for DBA was to make the database to perform well for large sets of users. Whereas with data warehouse appliances, it is the vendor responsibility of the physical design and tuning the software as per hardware requirements. Data warehouse appliance package comes with its own operating system, storage, DBMS, software, and required hardware. If required data warehouse appliances can be easily integrated with other tools.

Mobile technology: There are very limited vendors for providing Mobile business intelligence; MBI is integrated with existing BI architecture. MBI is a package that uses existing BI applications so people can use on their mobile phone and make informed decision in real time.

Page 50: UNIT 6 RECENT TRENDS. Hadoop Related Subprojects  Hive  SQL-like Query language and Metastore  Pig  High-level language for data analysis  HBase

Application areas Algorithmic trading Fraud detection Systems monitoring Application performance monitoring Customer Relationship Management Demand sensing Dynamic pricing and yield management Data validation Operational intelligence and risk management Payments & cash monitoring Data security monitoring Supply chain optimization RFID/sensor network data analysis Work streaming Call center optimization Enterprise Mashups and Mashup Dashboards Transportation industry

Page 51: UNIT 6 RECENT TRENDS. Hadoop Related Subprojects  Hive  SQL-like Query language and Metastore  Pig  High-level language for data analysis  HBase

Transportation industry can be benefited by using real-time analytics. For an example railroad network. Depending on the results provided by the real-time analytics, dispatcher can make a decision on what kind of train he can dispatch on the track depending on the train traffic and commodities shipped.

Page 52: UNIT 6 RECENT TRENDS. Hadoop Related Subprojects  Hive  SQL-like Query language and Metastore  Pig  High-level language for data analysis  HBase

Operational intelligence

Operational intelligence (OI) is a category of real-time dynamic, business analytics that delivers visibility and insight into data, streaming events and business operations.

Operational Intelligence solutions run queries against streaming data feeds and event data to deliver real-time analytic results as operational instructions.

Operational Intelligence provides organizations the ability to make decisions and immediately act on these analytic insights, through manual or automated actions.

Page 53: UNIT 6 RECENT TRENDS. Hadoop Related Subprojects  Hive  SQL-like Query language and Metastore  Pig  High-level language for data analysis  HBase

Purpose To monitor business activities and identify and detect situations

relating to inefficiencies, opportunities, and threats and provide operational solutions.

An event-centric approach to delivering information that empowers people to make better decisions.

These metrics act as the starting point for further analysis (drilling down into details, performing root cause analysis — tying anomalies to specific transactions and of the business activity).

Also provide the ability to associate metadata with metrics, process steps, channels, etc. With this, it becomes easy to get related information

e.g., "retrieve the contact information of the person that manages the application that executed the step in the business transaction that took 60% more time than the norm," or "view the acceptance/rejection trend for the customer who was denied approval in this transaction," or "Launch the application that this process step interacted with."

Page 54: UNIT 6 RECENT TRENDS. Hadoop Related Subprojects  Hive  SQL-like Query language and Metastore  Pig  High-level language for data analysis  HBase

Features Different operational intelligence solutions may use many different

technologies and be implemented in different ways. Common features of an operational intelligence solution: Real-time monitoring Real-time situation detection Real-time dashboards for different user roles Correlation of events Industry-specific dashboards Multidimensional analysis Root cause analysis Time Series and trend analysis

Big Data Analytics: Operational Intelligence is well suited to address the inherent challenges of Big Data. Operational Intelligence continuously monitors and analyzes the variety of high velocity, high volume Big Data sources. Often performed in memory, OI platforms and solutions then present the incremental calculations and changes, in real-time, to the end-user.

Page 55: UNIT 6 RECENT TRENDS. Hadoop Related Subprojects  Hive  SQL-like Query language and Metastore  Pig  High-level language for data analysis  HBase

Technology components

Operational intelligence solutions share many features, and therefore many also share technology components. This is a list of some of the commonly found technology components, and the features they enable:

Business activity monitoring (BAM) - Dashboard customization and personalization

Complex event processing (CEP) - Advanced, continuous analysis of real-time information and historical data

Business process management (BPM) - To perform model-driven execution of policies and processes defined as Business Process Model and Notation (BPMN) models

Metadata framework to model and link events to resources

Multi-channel publishing and notification Dimensional database

Page 56: UNIT 6 RECENT TRENDS. Hadoop Related Subprojects  Hive  SQL-like Query language and Metastore  Pig  High-level language for data analysis  HBase

Technology components

Root cause analysis Multi-protocol event collection Operational intelligence is a relatively new market segment

(compared to the more mature business intelligence and business process management segments).

Places complete information at one's fingertips, enabling one to make smarter decisions in time to maximize impact. By correlating a wide variety of events and data from both streaming feeds and historical data silos, operational intelligence helps organizations gain real-time visibility of information, in context, through advanced dashboards, real-time insight into business performance, health and status so that immediate action based on business policies and processes can be taken. Operational intelligence applies the benefits of real-time analytics, alerts, and actions to a broad spectrum of use cases across and beyond the enterprise.

One specific technology segment is AIDC (Automatic Identification and Data Capture) represented by barcodes, RFID and voice recognition.

Page 57: UNIT 6 RECENT TRENDS. Hadoop Related Subprojects  Hive  SQL-like Query language and Metastore  Pig  High-level language for data analysis  HBase

Comparison with other technologies or solutions

Business Intelligence OI is often linked to or compared with business intelligence

(BI) or real time business intelligence, in the sense that both help make sense out of large amounts of information. But there are some basic differences:

OI is primarily activity-centric, whereas BI is primarily data-centric. As with most technologies, each of these could be sub-optimally coerced to perform the other's task. OI is, by definition, real-time, unlike BI or “On-Demand” BI, which are traditionally after-the-fact and report-based approaches to identifying patterns. Real-time BI (i.e., On-Demand BI) relies on the database as the sole source of events.

OI provides continuous, real-time analytics on data at rest and data in-flight, whereas BI typically looks only at historical data at rest. OI and BI can be complementary. OI is best used for short-term planning, such as deciding on the “next best action,” while BI is best used for longer-term planning (over the next days to weeks). BI requires a more reactive approach, often reacting to events that have already taken place.

Page 58: UNIT 6 RECENT TRENDS. Hadoop Related Subprojects  Hive  SQL-like Query language and Metastore  Pig  High-level language for data analysis  HBase

If all that is needed is a glimpse at historical performance over a very specific period of time, existing BI solutions should meet the requirement. However, historical data needs to be analyzed with events that are happening now, or to reduce the time between when intelligence is received and when action is taken, then Operational Intelligence is the more appropriate approach.

Page 59: UNIT 6 RECENT TRENDS. Hadoop Related Subprojects  Hive  SQL-like Query language and Metastore  Pig  High-level language for data analysis  HBase

Systems management & OI

System Management mainly refers to the availability and capability monitoring of IT infrastructure.

Availability monitoring refers to monitoring the status of IT infrastructure components such as servers, routers, networks, etc.

This usually entails pinging or polling the component and waiting to receive a response.

Capability monitoring usually refers to synthetic transactions where user activity is mimicked by a special software program, and the responses received are checked for correctness.

Page 60: UNIT 6 RECENT TRENDS. Hadoop Related Subprojects  Hive  SQL-like Query language and Metastore  Pig  High-level language for data analysis  HBase

Complex event processing & OI

There is a strong relationship between complex event processing companies and operational intelligence, especially since CEP is regarded by many OI companies as a core component of their OI solutions.

CEP companies tend to focus solely on development of a CEP framework for other companies to use within their organisations as a pure CEP engine.

Page 61: UNIT 6 RECENT TRENDS. Hadoop Related Subprojects  Hive  SQL-like Query language and Metastore  Pig  High-level language for data analysis  HBase

Business activity monitoring & OI

Business activity monitoring (BAM) is software that aids in monitoring of business processes, as those processes are implemented in computer systems.

BAM is an enterprise solution primarily intended to provide a real-time summary of business processes to operations managers and upper management.

The main difference between BAM and OI appears to be in the implementation details — real-time situation detection appears in BAM and OI and is often implemented using CEP.

BAM focuses on high-level process models whereas OI instead relies on correlation to infer a relationship between different events.

Page 62: UNIT 6 RECENT TRENDS. Hadoop Related Subprojects  Hive  SQL-like Query language and Metastore  Pig  High-level language for data analysis  HBase

Business process management & OI

A business process management suite is the runtime environment where one can perform model-driven execution of policies and processes defined as BPMN models.

As part of an operational intelligence suite, a BPM suite can provide the capability to define and manage policies across the enterprise, apply the policies to events, and then take action according to the predefined policies.

A BPM suite also provides the capability to define policies as if/then statements and apply them to events.

Page 63: UNIT 6 RECENT TRENDS. Hadoop Related Subprojects  Hive  SQL-like Query language and Metastore  Pig  High-level language for data analysis  HBase

extra

Operational business intelligence, sometimes called real-time business intelligence, is an approach to data analysis that enables decisions based on the real-time data companies generate and use on a day-to-day basis. Typically, the data is queried from within an organization’s enterprise applications.  Operational business intelligence technology is primarily targeted at front-line workers, such as call center operators, who need timely data to do their jobs.

With operational BI, analysis can take place in tandem with business processing, so that problems can be spotted and dealt with sooner than with conventional after-the-fact business intelligence (BI) approaches. It enables the creation of a performance and feedback loop in which decision makers can analyze what’s happening in the business, act upon their findings and immediately see the results of those actions.

Data must be extremely current, which isn’t always possible with the traditional bounds of both enterprise reporting and data warehousing. However, most business processes at a typical company don’t require real-time data. With that in mind, a key part of every operational BI project is determining which business users need up-to-the-minute data for BI purposes and how they will handle getting data delivered to them in that fashion

Page 64: UNIT 6 RECENT TRENDS. Hadoop Related Subprojects  Hive  SQL-like Query language and Metastore  Pig  High-level language for data analysis  HBase

What is Embedded BI?

Developers that want to embed business intelligence and reporting features into applications must find a balance among requirements, cost and development time.

Today visionary software vendors and developers are blurring the lines between analytical and operational processes by embedding BI software into their applications.

They understand the benefits of process-driven BI, operational dashboards, and ad hoc reporting from transactional systems. They also realize that their applications are the source of organisational information and that embedded BI is a key requirement to meeting their client’s management information needs.

Page 65: UNIT 6 RECENT TRENDS. Hadoop Related Subprojects  Hive  SQL-like Query language and Metastore  Pig  High-level language for data analysis  HBase

Embedded BI is an integral part of a business application or process, not a separate application. As customers seek embedded reporting in their applications vendors that do so will develop competitive advantage, build customer loyalty and extend the life of their product within their existing customer base.

To achieve the competitive benefits of embedded BI, application developers need to select underlying technology that will support the basic requirements of the environment. These requirements are:

Real Time BI - Embedded BI acts on real time data, not time delayed data stored in a separate warehouse or OLAP cube. A key factor in this is the source of the data - it comes from the application (or uses the same source as the application), not a data warehouse or data mart.

Page 66: UNIT 6 RECENT TRENDS. Hadoop Related Subprojects  Hive  SQL-like Query language and Metastore  Pig  High-level language for data analysis  HBase

Seamless Integration – Users do not want to switch applications between undertaking operational and reporting activities. Integrated security and look and feel assist create a seamless integration between the host application and reporting.

End User Centric – Embedded BI is much more end-user focused than traditional BI. With embedded BI you cannot assume that your users has knowledge of both the BI application and the data set being analysed. Embedded BI needs to be significantly easier to use without training.

The BI I get needs to embedded into my operational world and help me make better decisions in real time. In order to do that I would need it to be relevant, timely and actionable. Let's look at these three items in the context of an example - I am considering eating one more slice of toast at 6:50, my next bus is at 7:05, the one after that is 7:40, the bus journey takes 12 minutes, then I have a 5 minute walk and I need to be at my desk at 8am to host a Web Conference, the fare is $2 and I only have a $20 bill in my pocket.

Page 67: UNIT 6 RECENT TRENDS. Hadoop Related Subprojects  Hive  SQL-like Query language and Metastore  Pig  High-level language for data analysis  HBase

Timely - This is easy, I need information before I put the toast in the toaster. If it comes after I put the toast in the toaster but before I start eating it then I might still get my bus but have to leave the toast behind and wasted. If I get it even later I may miss my bus. As you can see the later in the task I get the information the worse the result is.

Relevant - I don't want generic information on my bus catching/missing history over the last 12 months. I want to know how often does having a slice of toast 15 minutes before my bus is due causes me to miss my bus. Some information about how often the 7:05 bus arrives early would help too.

Actionable - In my bus example the action is pretty much down to me, I get up from the table and head out of the door, the action is in my own hands. One action I might need to take is to break my $20 bill to make sure I have the correct change for the bus fare, embedded BI would remind me of this and might even present the alternative option of borrowing $2 in change from my 5 year old's money box (A further action might be to remind me when I get home to pay it back)

Page 68: UNIT 6 RECENT TRENDS. Hadoop Related Subprojects  Hive  SQL-like Query language and Metastore  Pig  High-level language for data analysis  HBase

Agile Business Intelligence (BI) Agile Business Intelligence (BI)

refers to the use of Agile software development for BI projects to reduce the time it takes for traditional BI to show value to the organization, and to help in quickly adapting to changing business needs.

Agile BI enables the BI team and managers to make better business decisions, and to start doing this more quickly.

Page 69: UNIT 6 RECENT TRENDS. Hadoop Related Subprojects  Hive  SQL-like Query language and Metastore  Pig  High-level language for data analysis  HBase

Agile methodology works on the iterative principle; this provides the new features of software to the end users sooner than the traditional waterfall process which delivers only the final product.

With Agile the requirements and design phases overlap with development, thus reducing the development cycles for faster delivery.

It promotes adaptive planning, evolutionary development and delivery, a time-boxed iterative approach, and encourages rapid and flexible response to change.

Agile BI encourages business users and IT professionals to think about their data differently and it characterized by low Total Cost of Change (TCC).

With agile BI, the focus is not on solving every BI problem at once but rather on delivering pieces of BI functionality in manageable chunks via shorter development cycles and documenting each cycle as it happens. Many companies fail to deliver right information to the right business managers at the right time.

Page 70: UNIT 6 RECENT TRENDS. Hadoop Related Subprojects  Hive  SQL-like Query language and Metastore  Pig  High-level language for data analysis  HBase

Agile BI is a continual process and not a onetime implementation. Managers and leaders need accurate and quick information about the company and business intelligence provides the data they need. Agile BI enables rapid development using the agile methodology. Agile techniques are a great way to promote development of BI applications, such as dashboards, scorecards, reports and analytic applications.

"Forrester Research defines agile BI as an approach that combines processes, methodologies, tools and technologies, while incorporating organizational structure, in order to help strategic, tactical and operational decision-makers be more flexible and more responsive to ever-changing business and regulatory requirements". According to the research by the Aberdeen Group, organizations with the most highly agile BI implementations are more likely to have processes in place for ensuring that business needs are being met.

Success of Agile BI implementation also heavily depends on the end user participation and "frequent collaboration between IT and the business".

Page 71: UNIT 6 RECENT TRENDS. Hadoop Related Subprojects  Hive  SQL-like Query language and Metastore  Pig  High-level language for data analysis  HBase

Key Performance Criteria Availability of timely management information – IT

should be able to provide the right and accurate information in timely manner to the business managers to make sound business decisions. “This performance metric captures the frequency with which business users receive their information they need in the timeframe they need it”.

Average time required to add a column to an existing report – Sometimes new columns need to be added to an existing report to see the required information. "If that information cannot be obtained within the time required to support the decision at hand, the information has no material value. This metric measure the total elapsed time required to modify an existing report by adding a column".

Average time required to create a new dashboard – This metric considers the time required to access any new or updated information and it measures the total elapsed time required to create a new dashboard

Page 72: UNIT 6 RECENT TRENDS. Hadoop Related Subprojects  Hive  SQL-like Query language and Metastore  Pig  High-level language for data analysis  HBase

Five Steps To Agile BI Bruni in her article 5 Steps To Agile BI outlines the five elements that promote

an Agile BI enterprise environment. Agile Development Methodology – “need for an agile, iterative process that

speeds the time to market of BI requests by shortening development cycles”. Agile Project Management Methodology – continuous planning and

execution. Planning is done at the beginning of each cycle, rather than one time at the beginning of the project as in traditional projects. In Agile project, scope can be changed any time during the development phase.

Agile Infrastructure – the system should have virtualization and horizontal scaling capability. This gives flexibility to easily modify the infrastructure and could also maintain near-real-time BI more easily than the standard Extract, transform, load (ETL) model.

Cloud & Agile BI – Many organizations are implementing cloud technology now as it is the cheaper alternative to store and transfer data. Companies who are in their initial stages of implementing Agile BI should consider the Cloud technology as cloud services can now support BI and ETL software to be provisioned in the cloud.

IT Organization & Agile BI – To achieve agility and maximum effectiveness, the IT team should interact with the business, but also address the business problems and should have a strong and cohesive team.

Page 73: UNIT 6 RECENT TRENDS. Hadoop Related Subprojects  Hive  SQL-like Query language and Metastore  Pig  High-level language for data analysis  HBase

Advantages of using Agile BI

Agile BI drives its users to self-serve BI. It offers organizations flexibility in terms of delivery, user adoption, and ROI.

Faster to Deliver Using Agile methodology, the product is delivered in shorter

development cycles with multiple iterations . Each iteration is a working software and can be deployed to production.

Increased User Acceptance In an Agile development environment, IT and business work

together (often in the same room) refining the business needs in each iteration. "This increases user adoption by focusing on the frequently changing needs of the non-technical business user, leading to high end-user engagement, and resulting in higher user adoption rates".

Increased ROI Organizations can achieve increased rate-of-return (ROI) due to

shorter development cycles. This minimizes the IT resources and time while delivering working, relevant reports to end-users.

Page 74: UNIT 6 RECENT TRENDS. Hadoop Related Subprojects  Hive  SQL-like Query language and Metastore  Pig  High-level language for data analysis  HBase

BI on cloud

Page 75: UNIT 6 RECENT TRENDS. Hadoop Related Subprojects  Hive  SQL-like Query language and Metastore  Pig  High-level language for data analysis  HBase
Page 76: UNIT 6 RECENT TRENDS. Hadoop Related Subprojects  Hive  SQL-like Query language and Metastore  Pig  High-level language for data analysis  HBase

References

www.slideshare.com https://www.utdallas.edu https://en.wikipedia.org http://www.dashboardinsight.com https://blogs.oracle.com