1
EVOLVING ON-DEMAND INFRASTRUCTURE FOR HADOOP 2.0 AND YARN Prime Dimensions' Hadoop 2.0 Big Data infrastructure includes YARN and Tez, a distributed data operating system and development platform that extends the batch processing functionality of MapReduce by allowing multiple types of applications to be deployed directly across Hadoop clusters. Together, YARN and Tez represent a paradigm shift in processing, managing and analyzing Big Data. The real benefit of YARN is that it allows Hadoop clusters to execute workloads beyond MapReduce. With YARN, Hadoop now has a generic resource-management and distributed application framework, in which multiple data processing applications can run natively in Hadoop. YARN provides extensibility and scalability in Hadoop by splitting the roles of the Hadoop Job Tracker into two processes: (1) the resource management controls access to the clusters resources (memory, CPU, etc.), and (2) the application manager controls task execution. In conjunction with YARN, Prime Dimensions is also offering integration support for other Hadoop projects, such as Tez, a dataflow graph tool, and Spark, an open source, in-memory data analytics platform. Together, these projects make it possible to establish domain-specific enclaves over multi-tenant compute clusters, creating a virtualized data environment and unified analytics platform, as enterprises evolve from “systems of records” to “systems of engagement.” This often requires deploying in-memory, high performance, NoSQL technologies, but YARN, Tez and Spark offer new options for organizations seeking these analytic capabilities in Hadoop. As Hadoop gains widespread adoption not only as a Big Data technology but also as a data warehouse augmentation strategy, its basic functionality is evolving to meet the demands of increased performance and high scalability. YARN and Tez are not simply new releases; they represent a revolutionary advancement of Hadoop. We see tremendous opportunity for the adoption of YARN, Tez and Spark as enterprise solutions for generating advanced analytics with reduced time- to-value. There will be significant demand to upgrade early adopters to Hadoop 2.0. Moreover, with the advanced features and capabilities of YARN and Tez, the use cases that arise from this new paradigm span across industries with seemingly profound, endless possibilities. There are advantages of bringing together NoSQL, relational and/or in-memory solutions, both Open Source and proprietary. Such analytic offload supports the establishment of a unified analytics environment. To Learn More… Contact: Michael Joseph, Managing Partner 703.861.9897 [email protected] www.primedimensions.com @PrimeDimensions

Hadoop 2.0 and yarn

Embed Size (px)

Citation preview

Page 1: Hadoop 2.0 and yarn

EVOLVING ON-DEMAND INFRASTRUCTURE

FOR HADOOP 2.0 AND YARN

Prime Dimensions' Hadoop 2.0 Big Data infrastructure includes YARN and Tez, a distributed data operating system and development platform that extends the batch processing functionality of MapReduce by allowing multiple types of applications to be deployed directly across Hadoop clusters. Together, YARN and Tez represent a paradigm shift in processing, managing and analyzing Big Data.

The real benefit of YARN is that it allows Hadoop clusters to execute workloads beyond MapReduce. With YARN, Hadoop now has a generic resource-management and distributed application framework, in which multiple data processing applications can run natively in Hadoop. YARN provides extensibility and scalability in Hadoop by splitting the roles of the Hadoop Job Tracker into two processes: (1) the resource management controls access to the clusters resources (memory, CPU, etc.), and (2) the application manager controls task execution.

In conjunction with YARN, Prime Dimensions is also offering integration support for other Hadoop projects, such as Tez, a dataflow graph tool, and Spark, an open source, in-memory data analytics platform. Together, these projects make it possible to establish domain-specific enclaves over multi-tenant compute clusters, creating a virtualized data environment and unified analytics platform, as enterprises evolve from “systems of records” to “systems of engagement.” This often requires deploying in-memory,

high performance, NoSQL technologies, but YARN, Tez and Spark offer new options for organizations seeking these analytic capabilities in Hadoop. As Hadoop gains widespread adoption not only as a Big Data technology but also as a data warehouse augmentation strategy, its basic functionality is evolving to meet the demands of increased performance and high scalability. YARN and Tez are not simply new releases; they represent a revolutionary advancement of Hadoop. We see tremendous opportunity for the adoption of YARN, Tez and Spark as enterprise solutions for generating advanced analytics with reduced time-to-value. There will be significant demand to upgrade early adopters to Hadoop 2.0. Moreover, with the advanced features and capabilities of YARN and Tez, the use cases that arise from this new paradigm span across industries with seemingly profound, endless possibilities. There are advantages of bringing together NoSQL, relational and/or in-memory solutions, both Open Source and proprietary. Such analytic offload supports the establishment of a unified analytics environment.

To Learn More…

Contact: Michael Joseph, Managing Partner 703.861.9897 [email protected]

www.primedimensions.com

@PrimeDimensions