19
white paper A Revolutionary Approach for Advanced Analytics and Big Data Management Aster Database: The First MPP Database with Applications Inside

A Revolutionary Approach for Advanced Analytics …...A Revolutionary Approach for Advanced Analytics and Big Data Management Teradata Aster 4 limit the richness of analytics to fit

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Page 1: A Revolutionary Approach for Advanced Analytics …...A Revolutionary Approach for Advanced Analytics and Big Data Management Teradata Aster 4 limit the richness of analytics to fit

white paper

A Revolutionary Approach for Advanced Analytics and Big Data Management

Aster Database: The First MPP Database with Applications Inside

Page 2: A Revolutionary Approach for Advanced Analytics …...A Revolutionary Approach for Advanced Analytics and Big Data Management Teradata Aster 4 limit the richness of analytics to fit

A Revolutionary Approach for Advanced Analytics and Big Data Management

Teradata Aster 2

Contents

Executive Summary ....................................................................................................................................................... 3

Introducing Aster Database: An MPP Database with Analytic Applications Running Inside .................................................................................................................................... 4

Breakthrough Performance and Scalability ..................................................................................................... 5

Advanced In-Database Analytics with MapReduce ..................................................................................... 9

Accelerating Development of Advanced Analytics ................................................................................... 14

Efficient Management for Both Data and Applications .......................................................................... 16

Competing and Winning with Analytics ......................................................................................................... 18

About Teradata Aster .................................................................................................................................................. 19

Page 3: A Revolutionary Approach for Advanced Analytics …...A Revolutionary Approach for Advanced Analytics and Big Data Management Teradata Aster 4 limit the richness of analytics to fit

A Revolutionary Approach for Advanced Analytics and Big Data Management

Teradata Aster 3

Executive Summary Organizations in every major market are turning to a new generation of advanced analytic applications that leverage huge volumes of data in new ways to provide deeper, smarter insights. Applications like fraud detection, customer behavior analysis, trending and forecasting, scenario modeling, service personalization and targeting, and deep click-stream analysis are increasingly being used to drive real-time decisions, increase revenue, and reduce costs.

Operating a business today without serious insight into business data is simply not an option. Competitive advantage depends on the ability to manage and analyze all the critical data entering a business environment. With over 100% data growth per year in many enterprise applications and with 70-80% of enterprises’ data residing outside a traditional data warehouse, addressing the “Big Data” challenge has become a top priority for companies.

Legacy architectures and approaches to data management and analytics are inherently unfit for today’s realities of Big Data and advanced analytics. The traditional database architectures and multi-tier data pipelines are under severe strain because they were simply not designed to store and process terabytes to petabytes of data nor to perform the advanced analysis that has become common today. Traditional databases struggle with the complexity and poor performance that result from trying to express rich analytics in SQL and provide only very limited capabilities for going beyond SQL’s limitations. Additionally, moving large volumes of data through the traditional data pipeline from the data warehouse to an analytic application for processing takes significant amounts of time and delays analysis of fresh data. The larger the volume of data, the larger the time and effort needed to move it from one tier to another.

These challenges are so severe that application developers and analysts are forced to compromise the richness and depth of their analyses. They first reduce Big Data to “small data” via aggregations, windowing, or sampling and then perform computations on a subset of the data rather than the entire data set. They also are forced to spend significant amounts of time writing, testing, and modifying complex analytic logic in order to fit it into the limitations of traditional databases. Lower-quality analytics are the result, for which the organization pays the price in poor decisions and lost revenue opportunities.

This paper explains how Aster Database solves the challenges of advanced analytics that can scale to Big Data with a monumental shift in the way data and analytics are processed and managed. Aster Database delivers a revolutionary architecture for analytics and data management that allows rich analytic applications to be pushed down into Teradata Aster's Massively Parallel Processing (MPP) database so that they can run where the data natively resides. With Aster Database, it is no longer necessary to push Big Data through a network to an overworked application server, build aggregates or random samples, or

Page 4: A Revolutionary Approach for Advanced Analytics …...A Revolutionary Approach for Advanced Analytics and Big Data Management Teradata Aster 4 limit the richness of analytics to fit

A Revolutionary Approach for Advanced Analytics and Big Data Management

Teradata Aster 4

limit the richness of analytics to fit the limitations of the database. Instead, business users get the business-critical insights they need in the fastest possible time, taking advantage of Aster Database’s parallel processing of both data and analytics, with unprecedented simplicity and affordability.

Introducing Aster Database: An MPP Database with Analytic Applications Running Inside A new generation of advanced analytics has become critical to daily business operations. Their impact continues to grow as new applications constantly emerge to help organizations run their business, manage their products and services, and interact with their customers at speeds and scales that were once inconceivable.

While these analytic applications are used in many different industries and business processes, they have a few key characteristics. One is that they process Big Data–terabytes to even petabytes–from far more sources and events than in the past. Enterprises are capturing more and more touch points and interactions leading to every customer decision, creating an explosion of data that needs to be examined and understood. New types of data such as clickstream, GPS, biometric, and RFID data are flowing into the organization at a tremendous rate. With millions or billions of events occurring on short notice, most of this data (typically 70-80%) lives outside the enterprise data warehouse. While the intelligence in this data is critical, most companies have not been able to leverage it until now.

These applications are also characterized by increased depth and richness of analysis. They typically require access to full data sets rather than samples or aggregations in order to provide valuable insight–for example, detecting fraud requires looking at all transaction data in order to identify outliers or unusual patterns. Further, techniques such as statistical analysis, graph analysis, predictive modeling, and collaborative clustering are increasingly used to drive critical applications. They use techniques that are far more complex than what traditional Business Intelligence (BI) applications were designed for, such as simulations that weigh thousands or millions of possible variables and models that morph and change over time.

Finally, these new applications are dynamic–rather than simply reporting on past events, the new generation of advanced analytics is a critical tool in helping organizations make the best decisions as events unfold by predicting likely outcomes. As a result, rather than only a few statisticians and analysts accessing the system occasionally to run reports, many business analysts could be accessing the system constantly. These analysts require rapid results for both the interactive, ad hoc analysis that is crucial for data exploration as well as their recurring analytical workloads.

Page 5: A Revolutionary Approach for Advanced Analytics …...A Revolutionary Approach for Advanced Analytics and Big Data Management Teradata Aster 4 limit the richness of analytics to fit

A Revolutionary Approach for Advanced Analytics and Big Data Management

Teradata Aster 5

Aster Database is designed from the ground up to meet these challenges. Aster Database is a massively parallel (MPP) row and column database with an integrated analytics engine. This combination, called a “massively parallel analytics platform,” provides the industry’s most powerful platform for high-performance advanced analytics on terabytes to petabytes of data, greatly improving the ability of enterprise users to make informed decisions. Teradata Aster’s approach addresses major limitations of traditional data warehousing and business intelligence systems by making it easy to create advanced analytics and then embed 100% of analytic processing inside the database so that analytics are co-located with the data to drive rapid analysis, eliminate massive data movement, and avoid forced sampling. Combined with a visual development environment, easy administration, and the exceptional reliability required by business-critical applications, Aster Database makes it possible for organizations to perform rich and deep analysis of large data sets at ultra-fast speeds so that they can leverage their data in ways that were previously impractical or impossible.

Aster Database is available in a flexible set of offerings for deployment on premise or in the cloud:

• Aster Database packaged software provides Teradata’s analytic platform for installation on any certified commodity hardware.

• The Teradata Aster MapReduce Data Warehouse Appliance combines server hardware with Aster Database and valuable third-party software to simplify purchase, deployment, and configuration.

• The Aster Database Cloud Edition provides cloud integration between Aster Database and Amazon Web Services (AWS), AppNexus, Dell’s Data Cloud, and Terremark platforms. It is ideally suited to fully use the elasticity, scalability, and persistence of cloud computing.

Breakthrough Performance and Scalability

Pervasive Parallelism Aster Database is a Massively-Parallel Processing (MPP) database that provides end-to-end parallelism of data and analytic processing, allowing organizations to examine very large data sets with unprecedented granularity and depth of analysis. The result: 10–100 times faster performance than other architectures and scalability to terabytes and even petabytes of data.

Aster Database is the first MPP database to deliver pervasive parallelism, independently parallelizing all functions rather than simply parallelizing a few functions and leaving others to run sequentially over a shared resource. Aster Database executes loads, queries,

Page 6: A Revolutionary Approach for Advanced Analytics …...A Revolutionary Approach for Advanced Analytics and Big Data Management Teradata Aster 4 limit the richness of analytics to fit

A Revolutionary Approach for Advanced Analytics and Big Data Management

Teradata Aster 6

exports, backups, recoveries, installs and upgrades in parallel to take full advantage of all resources, optimizing performance for all data warehouse and analytic operations. Each function can be scaled independently simply by adding additional commodity servers at any time. Independent parallelization and scaling of each function prevents bottlenecks from occurring anywhere in the data management and data processing lifecycle.

Pervasive parallelism is delivered by Aster Database’s internal architecture. Aster Database consists of four separate classes of nodes that reside on commodity servers: Queens, Workers, and Loaders, as well as an independent Backup Cluster as illustrated in Figure 1 below.

Figure 1: Aster Database’s pervasive parallelism provides end-to-end parallelism and optimized performance for all data warehouse and analytics operations

Queen nodes provide the external interface to the data warehouse. End users and database administrators can connect to a Queen through ODBC/JDBC, while systems administrators monitor Aster Database through the Aster Management Console (AMC). The Queen nodes are also responsible for coordinating the cluster servers in query processing, result aggregation, and failure handling.

Worker nodes are responsible for parallel execution of queries and in-database analytics, directed by one or more Queen nodes. Worker nodes also store partitions of data and replicas of data that reside on other Worker nodes. Finally, Worker nodes participate in maintenance tasks (e.g., indexing, load balancing) initiated by Queen nodes.

Loader nodes are responsible for rapid loading and partitioning of new data into the Worker nodes. Loaders can perform both trickle-feed loading for granular data loading as well as bulk loading proven to be able to load up to eight terabytes of fresh data per hour. Additionally, Loader nodes can export data for use in other systems.

Page 7: A Revolutionary Approach for Advanced Analytics …...A Revolutionary Approach for Advanced Analytics and Big Data Management Teradata Aster 4 limit the richness of analytics to fit

A Revolutionary Approach for Advanced Analytics and Big Data Management

Teradata Aster 7

Backup nodes are responsible for backing up compressed Aster Database user data and metadata. Backup servers provide local data protection (over a LAN) and remote disaster protection (over a WAN) and offer multiple backup options including full backup, incremental backup, and logical table-level backup. All backups and recoveries are performed in parallel, running non-disruptively in the background across all backup servers.

Another advantage of the Aster Database architecture is the ability to allocate heterogeneous hardware for different tiers. For example, the Loader tier may leverage CPU- or memory-heavy servers with only one or two small-capacity disk drives since data is not persistently stored on Loader nodes. In contrast, Backup nodes may leverage servers with 48 large-capacity drives since the Backup tier is focused on optimizing backup cost per gigabyte of data. By offering the flexibility to allocate the appropriate server hardware for each tier on demand, enterprises not only ensure optimal SLAs for that particular function, but also ensure lower costs.

Unlimited Linear Scalability Aster Database's unique Online Precision Scaling™ enables Aster Database to achieve breakthrough performance with linear scale-out to terabytes and petabytes of user data. Architected to take advantage of ongoing advances in large-scale distributed computing, Aster Database provides linear scalability for loads, queries, and backups, independently or in unison to meet requirements.

Aster Database’s multi-tier architecture allows each tier – query processing, backup, and loading – to be independently scaled, providing the flexibility to scale each function cost-effectively as needed based on workload characteristics rather than be forced to scale the entire system to address a single bottleneck. Capacity can be added to functions independently, offering significant flexibility and Total Cost of Ownership (TCO) advantages compared to scaling the entire system. For example, if data volumes or processing requirements grow, the number of Worker servers can be increased; if faster loads or higher volume loads or exports are desired, more Loader servers can be provisioned; if backup retention policies lengthen or backup volumes grow, the number of backup servers can be increased. This architecture also allows you to choose the most cost-effective hardware for each function. For example, functions such as backup that occur less frequently and are more storage-intensive than CPU-intensive can run on lower-cost hardware configured with extra storage.

When new resources are added, Aster Database is designed to take full advantage of those resources with maximum parallelism. Aster Database’s capabilities for granular splitting and load balancing of virtual partitions ensure that new resources are efficiently leveraged to deliver increased performance. Aster Database’s patent-pending, dual-stage query optimizer ensures use of maximum resources as the system scales and processing

Page 8: A Revolutionary Approach for Advanced Analytics …...A Revolutionary Approach for Advanced Analytics and Big Data Management Teradata Aster 4 limit the richness of analytics to fit

A Revolutionary Approach for Advanced Analytics and Big Data Management

Teradata Aster 8

demands grow. After optimizing each query globally across all MPP nodes, local optimization on each partition fine-tunes processing at a local level. This ability to adapt on-the-fly to the latest resource use ensures the fastest possible response even at massive scale.

Aster Database’s unique architecture also provides administrators with the critical ability to scale to hundreds of diverse workloads and users while ensuring predictable and consistent performance. Aster Database’s Dynamic Mixed Workload Management capabilities allocate scheduling and resources to ensure consistent performance even as the number of users and workloads grows. In addition, Aster Database’s multi-tier architecture ensures that heavy processing in one area does not impact other areas. For example, load execution is kept independent of query execution, which is kept independent of export or backup execution.

Aster Database delivers this scalability with exceptional ease of use and flexibility. Adding more capacity is a simple matter of plugging a new commodity server into the local network and performing one-click incorporation through the Aster Management Console. The system automatically recognizes the new resources and rebalances the workload. During peak load windows a system administrator has the flexibility to dynamically re-provision Worker nodes into Loader node identities to provide additional loading resources for load balancing. In peak query windows, the reverse can occur. No additional outside servers are required, enabling cost-effective task-based scaling without disruption.

Dynamic Workload Management for High Concurrency and Predictable Service Levels As companies have come to rely on data-driven applications and ultra-fast analytics across all sectors of the organization, data warehouses are now expected to serve a broad range of applications and users in their daily business operations. Many of these users require rapid results and low latencies in order to support urgent analysis as well as ad hoc and exploratory analysis. At the same time, other tasks such as loading and backups that are running in the background need access to resources but cannot be allowed to disrupt other important workloads.

Delivering consistent, scalable performance with these many concurrent users and workloads without disruptions is critical to enabling large-scale advanced analytics. When hundreds or thousands of mixed workloads are executing simultaneously, it becomes increasingly difficult to prioritize and intelligently allocate the right amount of system resources to the right workloads at the right time. Traditional manual database tuning cannot possibly keep pace with rapidly changing workload demands. Manual tuning is not only slow and complex, it also very expensive–industry surveys have consistently shown that up to 80% of TCO is attributed to ongoing maintenance, dwarfing initial

Page 9: A Revolutionary Approach for Advanced Analytics …...A Revolutionary Approach for Advanced Analytics and Big Data Management Teradata Aster 4 limit the richness of analytics to fit

A Revolutionary Approach for Advanced Analytics and Big Data Management

Teradata Aster 9

hardware acquisition costs. Systems need to optimize performance and cost, in particular avoiding duplication of data and systems just to separate different workloads.

Aster Database addresses these challenges with automated tools that keep the system running efficiently even under peak demands, saving a great deal of administrative time that would be required to tune a traditional system. Aster Database includes a Dynamic Mixed Workload Manager, the industry’s most advanced workload management capability for large-scale distributed computing on commodity hardware. Intuitive, fine-grained policy controls allow administrators to define and manage diverse workloads to meet the organization’s business priorities. Using the Workload Manager (Figure 2), administrators can define rules that reallocate resources on the fly across hundreds of distributed nodes to adapt to new workloads and changing priorities in real time. The result is highly predictable performance and guaranteed service levels for the complex mixed workloads of an enterprise data warehouse and analytic-intensive applications.

Workload management rules are easily created and managed using the Teradata Aster Management Console. Rules are written as easy-to-read SQL predicates, eliminating complex tuning.

Figure 2: Rules-based resource reallocation for different constituents and workloads

Advanced In-Database Analytics with MapReduce As enterprises struggle to manage and leverage exploding data volumes, they also face a lack of capabilities within the traditional data warehouse for processing and scaling advanced analytics. This includes event-based analytic applications like fraud detection

Page 10: A Revolutionary Approach for Advanced Analytics …...A Revolutionary Approach for Advanced Analytics and Big Data Management Teradata Aster 4 limit the richness of analytics to fit

A Revolutionary Approach for Advanced Analytics and Big Data Management

Teradata Aster 10

that execute within a business process and responsive, exploratory analytic applications on fine grained data.

Advanced analysis has traditionally required complex coordination and interaction among specialized analysts and data programmers to extract samples of data from large data sets, appropriately transform the data for advanced analysis, build analytical models using data mining toolsets and then develop scoring programs in proprietary database languages to execute data mining model results on the complete data set. Some analysts note that up to 70% of the time on data mining projects can be spent on preparing data for analysis, just to get the data into a usable state.

Adding to these challenges, traditional data warehouses were designed for SQL, a useful declarative language for simple queries and many administrative tasks but one that has significant limitations for expressing rich analytic operations. Common types of analysis such as time series analysis, clickstream analysis, graph analysis, and the like are generally highly complex to express in SQL as well as being difficult for SQL to process and scale with acceptable performance. Although some databases have basic in-database capabilities such as stored procedures or User-Defined Functions (UDFs), these approaches have significant limitations in flexibility, richness, and performance that limit their ability to perform advanced analysis.

As a result of these obstacles, enterprise IT organizations are forced to export data from their database to an external analytic application for processing. In addition to the cost of storing and processing data in multiple systems, this architecture presents significant challenges that only worsen over time:

• High data latency: moving data between the data warehouse and the analytics server can take many hours and require significant network bandwidth.

• High processing latency: most analytic applications run on symmetric multiprocessing (SMP) systems that process data serially with limited CPU and storage resources, taking hours to days to finish.

• Limited insights: sampling or aggregation is typically required to avoid overwhelming the analytics server and network, but yields much weaker insights than analyzing the full data set because it can fail to include outliers or uncommon events. Analytics done inside the database in order to analyze full data sets struggle to express advanced logic as a result of the limitations of SQL and of traditional in-database processing capabilities.

In-Database Processing of 100% of Analytic Computations With Aster Database, Teradata Aster has taken a unique approach to solving the cost and performance challenges of running advanced analytics against Big Data. Teradata Aster is

Page 11: A Revolutionary Approach for Advanced Analytics …...A Revolutionary Approach for Advanced Analytics and Big Data Management Teradata Aster 4 limit the richness of analytics to fit

A Revolutionary Approach for Advanced Analytics and Big Data Management

Teradata Aster 11

the first commercial vendor to bring to market a unified solution that seamlessly embeds analytic applications with the data where it natively resides for ultra-fast and ultra-deep analytics and data processing. By avoiding the need to move data from the database to the analytic application for processing, Teradata Aster overcomes critical challenges and limitations of the traditional analytics architecture. This new approach to analytic application processing delivers orders of magnitude better performance and scalability for advanced analytics including exploratory and ad hoc analysis, providing analysts and businesses the freedom to examine the largest data sets and gain unique analytical insights in the shortest time possible.

Aster Database’s unique Applications-Within® architecture delivers this breakthrough by processing full analytic logic inside the database, leveraging Aster Database’s massively-parallel architecture and patent-pending SQL-MapReduce® (SQL-MR) framework to fully parallelize processing for ultra-fast analysis of massive data sets. With Aster Database’s Applications-Within architecture, all data stored in the database is available to all in-database analytics, eliminating the need to move data between the database and analytic applications. Rather than spending hours to weeks processing Big Data, Aster Database distributes application processing and data processing across commodity hardware for results in minutes to seconds. This frees up analysts from the complexity and delays of data preparation and data movement tasks so that they can spend more time on data exploration and richer analytics, resulting in deeper and more accurate analysis and allowing a dramatic reduction in cycle time.

The Teradata Aster architecture makes it possible for any analytic application, custom or packaged, to be pushed into the database without requiring rewriting (see Figure 3). Embedded applications have access as a first-class citizen to all of the services available in the database including memory management, workload management and fault tolerance Teradata Aster’s approach of running applications within the database goes beyond what is provided by traditional stored procedures and User-Defined Functions (UDFs). Aster Database provides a complete application execution environment, separate from data management, so that applications have the resources required to execute with maximum performance and stability. Database services including failover, performance, Information Lifecycle Management (ILM), and dynamic workload management are optimized specifically to meet the requirements for both analytics execution and high-performance, scalable data management.

Page 12: A Revolutionary Approach for Advanced Analytics …...A Revolutionary Approach for Advanced Analytics and Big Data Management Teradata Aster 4 limit the richness of analytics to fit

A Revolutionary Approach for Advanced Analytics and Big Data Management

Teradata Aster 12

Figure 3: Aster Database embeds applications with data in a unified MPP platform for scalable data management and ultra-fast analysis of large volumes of data

Automatically Parallel Analytics via SQL-MapReduce Until recently, massively parallel processing of data required extremely specialized programming skills to design for parallelization. The MapReduce framework popularized by Google has rapidly emerged as a standard way to simplify parallelization, but does require specialized developers who are experienced with its programming paradigm. Teradata Aster has dramatically increased the accessibility and ease of use of MapReduce for analytics by coupling SQL with Map Reduce in the patent-pending SQL-MapReduce (SQL-MR) framework. The SQL MapReduce framework combines the analytic power of MapReduce with the familiarity of SQL, so that any business analyst can leverage the power of MapReduce from any SQL statement without needing to learn MapReduce programming or parallel programming concepts. Using SQL-MR, any analytics code running in database or any MapReduce function can be incorporated into an analytic application through a SQL statement. Aster Database automatically parallelizes the application processing using MapReduce so that any in-database application runs in a massively parallel processing environment supported by commodity hardware.

The SQL-MapReduce approach to parallelizing analytics is a significant technological innovation. As queries running in a traditional database become more complex, including multi-pass statements or exceeding the limitations of SQL, parallelizing analytic logic becomes much more complicated. No longer is it sufficient to replicate a query across multiple nodes and aggregate the output. Parallelization of complex queries requires designing queries for parallelism, a challenge addressed by the SQL-MapReduce framework and the pre-defined SQL-MapReduce functions that Teradata Aster provides in

Page 13: A Revolutionary Approach for Advanced Analytics …...A Revolutionary Approach for Advanced Analytics and Big Data Management Teradata Aster 4 limit the richness of analytics to fit

A Revolutionary Approach for Advanced Analytics and Big Data Management

Teradata Aster 13

Aster MapReduce Portfolio. The SQL-MR framework makes it possible for Aster Database to seamlessly assemble, parallelize and execute standard ANSI SQL, custom functions, and packaged analytic applications–all in an extremely cost-effective software platform that’s easy to use and manage.

Using SQL-MR, developers can write powerful and highly expressive functions in a variety of languages including Java, C, C++, C#, Python, and R and push them into the database for advanced in-database analytics (see Figure 4). Additionally, pre-packaged analytic toolsets for business intelligence or data mining that use standard SQL can natively access a MapReduce enabled analytic application without any code changes, making the power of MapReduce easily and transparently accessible to business analysts.

Figure 4: Seamless execution of SQL and SQL-MR functions (using Java or other language of choice) inside the Aster Database massively parallel analytic platform

The ability to apply MapReduce to data from diverse sources with varying degrees of structure provides groundbreaking potential for unique, provocative insights to enable enterprises to gain competitive advantage. Analytics that were unimaginable in the past are now easy to achieve and execute in seconds.

A few real-world examples of applications that benefit from this architecture include:

• Predictive and granular forecasting

• Trend analysis and modeling

• Sequential pattern analysis (e.g. fraud detection, attribution, or behavior analysis)

• Time series analysis (e.g. financial trading and risk modeling)

Page 14: A Revolutionary Approach for Advanced Analytics …...A Revolutionary Approach for Advanced Analytics and Big Data Management Teradata Aster 4 limit the richness of analytics to fit

A Revolutionary Approach for Advanced Analytics and Big Data Management

Teradata Aster 14

• Graph analytics (e.g. network optimization, human intelligence, “influencer” marketing)

• Text analytics (“voice of the customer” for improved customer satisfaction and retention)

• Statistical machine learning algorithms (linear regression, K-means clusters, SVMs, etc.)

• Transformation pre-processing (e.g. aggregations and other cleansing or normalization routines)

• And many others

Accelerating Development of Advanced Analytics Development of advanced analytic applications has traditionally been hindered by complexity and inefficiency, delaying deployment and forcing analysts to spend significant effort on the mechanics of development rather than on the application logic that delivers insights for the business questions they are trying to answer. Teradata Aster provides an intuitive, fully-integrated development environment and a suite of powerful MapReduce analytics functions to dramatically simplify and accelerate the development, testing, and deployment of rich analytic applications. These tools make it possible to build rich analytics applications not in weeks or months but in days due to the simplicity of SQL-MapReduce and Teradata Aster’s extensive suite of pre-built rich analytics functions.

Visual Development Environment Aster Developer Express accelerates development, validation, and deployment of advanced analytic applications by providing the first integrated visual environment for developing analytic applications with SQL and MapReduce.

Intuitive development: Developer Express integrates with the popular Eclipse Integrated Development Environment (IDE) to enable developers to write applications that leverage SQL and MapReduce in a rich graphical development environment. Using Developer Express, developers can easily write, compile, and validate analytic applications; transfer existing code to SQL-MR applications; and leverage the pre-built analytics in Aster MapReduce Portfolio. Developer Express also includes SQL-MR wizards that automatically package custom analytic logic for push down into the Aster Database database, enabling them to focus on valuable analytic logic rather than on the mechanics of integrating that logic with the database.

Rapid testing: Developer Express enables rapid, frequent testing by allowing developers to validate their application code on their desktop without requiring access to a running database. Developers can launch tests of their applications from within the graphical

Page 15: A Revolutionary Approach for Advanced Analytics …...A Revolutionary Approach for Advanced Analytics and Big Data Management Teradata Aster 4 limit the richness of analytics to fit

A Revolutionary Approach for Advanced Analytics and Big Data Management

Teradata Aster 15

development environment, using the desktop test environment to simulate application processing, and then view the results of their test within the Eclipse IDE.

One-click deployment: Developers can embed their completed applications in the Aster Database with a single click directly from the IDE. Rather than spending time working through the mechanics of embedding their applications in the database, developers and administrators can focus on higher-value development.

Figure 5: Aster Developer Express makes advanced analytics on big data easy with the first integrated environment for developing, testing, and deploying advanced in-database analytics

Optimized MapReduce Analytic Functions Aster MapReduce Portfolio accelerates the development of rich analytics with the first suite of analytic functions built for in-database MapReduce. These powerful ready-to-use functions are optimized to take advantage of Teradata Aster’s SQL-MR framework for advanced in-database processing. Using the functions in Aster MapReduce Portfolio makes it simple to rapidly create advanced analytics that leverage the performance and scalability enabled by Aster Database’s in-database MapReduce and SQL-MR.

The Aster MapReduce Portfolio components provide powerful, pre-tested analytic functions that can be plugged into a wide variety of applications. For example, nPath is a

Page 16: A Revolutionary Approach for Advanced Analytics …...A Revolutionary Approach for Advanced Analytics and Big Data Management Teradata Aster 4 limit the richness of analytics to fit

A Revolutionary Approach for Advanced Analytics and Big Data Management

Teradata Aster 16

sequential and trending pattern analysis framework built on SQL-MapReduce that discovers relationships between rows of data that usually cannot be expressed through SQL. The ability to invoke a simple nPath extension to the SQL language leverages the compute power of the Aster Database database to greatly increase query performance. Applications include customer shopping sequences, telephone calling patterns, stock market trading sequences, and more. Other examples of types of functions in Aster MapReduce Portfolio include:

• Path analysis functions to discover patterns in rows of sequential data for use in scenarios including time-series analysis, predictive analytics, and web analytics such as click-stream analysis.

• Statistical analysis functions to perform high-performance processing of common statistical calculations for use in a variety of applications including analysis of portfolios, market prices, consumer behavior, and security.

• Relational analysis functions for discovering important relationships among transaction, graph, or text data for use cases that include retail optimization, network analysis, and log file analysis.

Efficient Management for Both Data and Applications Aster Database automates and simplifies important aspects of monitoring and managing availability and performance not only for data queries but also in-database analytic applications. As the system scales from three to 30 to 300 servers or more, Aster Database’s management capabilities ensure that administrative effort and costs remain minimal.

Always-On Fault Tolerance and Online Administration Downtime, whether due to system failures or to maintenance operations, is disruptive to both administrators and users, particularly when analytics are integral to the business and people need to access the system around the clock. Traditional systems often require downtime or disruption for routine operations such as loading data, backing up data, or scaling up the system. Avoiding planned downtime requires a highly available architecture that can process loading, backups and similar tasks without disrupting performance. Fault tolerance is also critical so that operations are not affected in the event of hardware failures, software failures, or user errors. As the size of the system scales out to accommodate data growth, innovative approaches are needed to maintain application availability in the face of increasing risk of failures in a component of the system.

Aster Database’s “Always-On” architecture is designed to minimize and avoid both planned and unplanned downtime. “Live Administration” enables non-disruptive

Page 17: A Revolutionary Approach for Advanced Analytics …...A Revolutionary Approach for Advanced Analytics and Big Data Management Teradata Aster 4 limit the richness of analytics to fit

A Revolutionary Approach for Advanced Analytics and Big Data Management

Teradata Aster 17

operations including online scaling, simultaneous load and export during queries, online backup and recovery, and online restoration, eliminating downtime or disruption traditionally required for these routine tasks. Aster Database is also designed from the ground up to avoid unplanned outages due to hardware and software failures, user or administrator error, and local or regional disasters. Aster Database leads the industry in massive-scale fault tolerance with replication, automatic failover, NIC bonding, failure heuristics, and clustered backup to prevent unplanned downtime due to hardware or software failures. In the event of hardware or software failure, Aster Database’s patent-pending Recovery-Oriented Computing (ROC) capabilities and innovations in online data redistribution ensure real-time recovery.

Aster Database’s “Always-On” architecture enables a massively scalable platform with continuous availability using standard, off-the-shelf commodity servers. Aster Database can process queries with consistently high performance even when it is:

• Experiencing a hardware failure

• Recovering from a hardware failure

• Adding capacity

• Performing backups

• Loading data

• Exporting data

Aster Database is uniquely designed to deliver this level of mission-critical resilience against both unplanned and planned downtime on MPP commodity hardware.

Powerful Console for Visibility and Control The Aster Management Console (AMC) provides rich visibility into and control of the Aster Database platform and the applications running inside, making it easy to configure, manage and monitor data, applications, users, and infrastructure.

The AMC’s intuitive web-based graphical interface enables easy monitoring with summary dashboards, graphical views of query and process execution, and easy access to common administrative operations. Dashboards provide at-a-glance visibility and easy drill-down into system status, application metrics, and query performance. Single-click scaling and point-and-click access to workload management policies and backup processes further streamline and simplify management.

Page 18: A Revolutionary Approach for Advanced Analytics …...A Revolutionary Approach for Advanced Analytics and Big Data Management Teradata Aster 4 limit the richness of analytics to fit

A Revolutionary Approach for Advanced Analytics and Big Data Management

Teradata Aster 18

Figure 6: The Aster Management Console (AMC) provides deep visibility, monitoring, and control of data and analytic application processing in an intuitive graphical console

Competing and Winning with Analytics Aster Database revolutionizes the ability to capture critical intelligence from the huge data volumes flowing into organizations. For enterprises that depend on advanced analysis of large data volumes to drive daily operations and profitability, there’s simply nothing else like it.

With Aster Database, Teradata Aster opens up a new world of high-performance, scalable analytics that was previously out of reach for most companies. In this new world, analytic functions run natively where the data resides, in parallel across hundreds or thousands of processing instances–as many as you require–for dramatic performance gains. The ability to quickly build rich analytic functions and push them into the MPP database with a single click allows companies to leverage the insights hidden in their data in powerful new and actionable ways.

Aster Database is ideal for any analytics that require sophisticated analysis that can scale to massive data sets: from data mining to credit scoring, risk modeling, ad targeting, fraud detection, cross-sell/up-sell bundling, and many more. Hundreds of concurrent workloads and users, whether running simple queries or the most complex analytics, execute with phenomenal speed, availability, and scalability.

Page 19: A Revolutionary Approach for Advanced Analytics …...A Revolutionary Approach for Advanced Analytics and Big Data Management Teradata Aster 4 limit the richness of analytics to fit

A Revolutionary Approach for Advanced Analytics and Big Data Management

Teradata Aster 19

Welcome to the new world of advanced analytics and Big Data, a world in which deep analytic insights gathered from many data sources and large volumes of data enable data-driven business decisions that deliver competitive advantage with less work and lower costs.

About Teradata Aster The Teradata Aster MapReduce Platform is the market-leading big data analytics solution. This analytic platform embeds MapReduce analytic processing for deeper insights on new data sources and multistructured data types to deliver analytic capabilities with breakthrough performance and scalability. Teradata Aster’s solution utilizes Aster’s patented SQL-MapReduce to parallelize the processing of data and applications and deliver rich analytic insights at scale. For more information, visit www.asterdata.com or for more about Teradata, visit teradata.com.