Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
DRIVESCALE-MAPRReference Architecture
Drivescale-MAPR
2 of 18©2018 DriveScale Inc. All Right Reserved.
R E F E R E N C E A R C H I T E C T U R E
Table of Contents
Glossary of Terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Table 1: Glossary of Terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1 . Executive Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2 . Audience and Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3 . DriveScale Advantage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Flex your data center with DriveScale for Big Data workloads . . . . . . . . . . . . . . . . . . 5
4 . MapR Advantage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
5 . Industry Standard Servers and JBODs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
6 . DriveScale-MapR Solution Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
7 . DriveScale Components Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
7.1 The DriveScale Composer, Server Agents and DriveScale Central . . . . . . . . . . . . 8
7.2 DriveScale HDD Appliance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
7.3 DriveScale Solution Conceptual Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
8 . Reference Architecture Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
8.1 Physical Cluster Components and Configuration List . . . . . . . . . . . . . . . . . . . . . 11
8.2 Logical Cluster Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Drivescale-MAPR
3 of 18©2018 DriveScale Inc. All Right Reserved.
R E F E R E N C E A R C H I T E C T U R E
8.3 Physical Cluster Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
8.4 Cluster Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
8.5 Disk and Filesystem Layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
8.6 DriveScale-MapR OS Supportability/Compatibility Matrix . . . . . . . . . . . . . . . . . . 17
9 . Rack Scalability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
10 . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
11 . Bill of Materials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
12 . Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
R E F E R E N C E A R C H I T E C T U R E
©2018 DriveScale Inc. All Right Reserved.
DRIVESCALE-MAPR
Glossary of Terms
Table 1: Glossary of Terms
Term Description
Data Node Worker nodes of the cluster to which the MapR-FS data is written.
HDD Appliance The DriveScale HDD Appliance is a 1RU Ethernet to SAS adapter serving as a bridge between 10 Gbps Ethernet connecting compute resources to JBODs full of commodity disks.
DriveScale Central DriveScale Central is a web-based user interface to the DriveScale cloud that performs DriveScale account management. DSC is where you download the keys to enable installation of the DriveScale software, and then set up your DriveScale Management Domain(s) (DMDs). This is where you create your domain, select and configure the Comnposer nodes for the domain, and select a chassis (with its associated DriveScale HDD Appliance) for the domain.
DriveScale Composer
The DriveScale Composer is software that creates composable infrastructure from a set of diskless servers and disk drives.
HDD Hard disk drive.
MapR-FS MapR Distributed File System.
High Availability The configuration that addresses availability issues in a cluster. In a standard configuration, the Name Node is a single point of failure (SPOF). Each cluster has a single Name Node, and if that machine or process became unavailable, the cluster as a whole is unavailable until the Name Node is either restarted or brought up on a new host. The secondary Name Node does not provide failover capability. High availability enables running two Name Nodes in the same cluster: the active Name Node and the standby Name Node. The standby Name Node allows a fast failover to a new Name Node in the case of machine crash or planned maintenance.
JBOD JBOD (Just a bunch of disks). A collection of hard disks that have not been configured to act as a redundant array of independent disks (RAID) array.
Job History Server The process that archives job metrics and metadata. One per cluster.
MLAG Multichassis Link Aggregation. MLAG is the ability of two or more switches to act like a single switch when forming link bundles.
Master/Control/ Administrator Node
The metadata master of MapR essential for the integrity and proper functioning of the distributed filesystem.
NIC Network interface card.
Node Manager The process that starts application processes and manages resources on the Data Nodes.
4 of 18
Drivescale-MAPR
5 of 18©2018 DriveScale Inc. All Right Reserved.
R E F E R E N C E A R C H I T E C T U R E
Term Description
PDU Power distribution unit.
ToR Top of the rack.
ZooKeeper Zoo Keeper is a centralized service for maintaining configuration information, naming, and providing distributed synchronization and group services.
1. Executive Summary
DriveScale has engineered a next-generation Software Composable Infrastructure (SCI) solution designed to fundamentally change the way data center teams design, deploy, manage and consume hardware and software resources. DriveScale provides capabilities to IT operators to connect disaggregated pools of resources in an intelligent manner, and then to manage, modify and scale these resources over time. SCI results in a higher performance and more easily deployed infrastructure with fluid resources that can be used for modern Big Data workloads to significantly improve agility.
This document is a high-level design reference architecture guide for implementing MapR with DriveScale SCI on industry standard servers and JBODs. The reference architecture introduces all the high-level components, hardware, and software, included in the stack. Each high-level component is then described individually. The reference architecture does not describe the MAPR components or their applications.
2. Audience and Scope
This reference architecture guide is for Big Data and IT architects who are responsible for the design and deployment of MapR solutions on premises in data centers, as well as for Apache Hadoop administrators and architects, and data center architects or engineers who collaborate with specialists in that space.
3. DriveScale Advantage
Flex your data center with DriveScale for Big Data workloads
Big Data workloads have become an integral part of traditional data centers. They are designed to scale by adding more hardware resources to the same compute cluster. This flexibility is built into Big Data vendor software.
However, typical Big Data deployments have multiple limitations:
• Administrators can’t respond quickly to changing application stacks and data velocity.
• Deployments are over-provisioned with resources and under-utilized in order to meet service level guarantees.
• Multiple silos of hardware are created for each different application workload.
DriveScale’s rackscale SCI architecture provides solutions to all of these modern Big Data deployment limitations. With DriveScale’s SCI solution, administrators can flexibly deploy and manage independent pools of compute and storage resources at a lower cost without making changes to the application stack.
Drivescale-MAPR
6 of 18©2018 DriveScale Inc. All Right Reserved.
R E F E R E N C E A R C H I T E C T U R E
4. MapR Advantage
The MapR Converged Data Platform is a proven solution for delivering business value in data-driven companies. The MapR Converged Data Platform delivers speed, scale, and reliability, driving both operational and analytical workloads in a single platform. The MapR Platform is designed to deliver:
• High availability
• Ease of data integration
• Lower total cost of ownership
5. Industry Standard Servers and JBODs
The DriveScale solution works with any type of industry standard x86 server. Customers can customize the server with any compute (memory and CPU) configuration and purchase from existing OEMs and channels.
DriveScale recommends the purchase of high capacity JBODs (Just a Bunch of Disks) with dual hot-pluggable IO controllers (Expanders) and enough upstream bandwidth. The JBODs should also have dual hot-pluggable redundant power supplies. DriveScale has evaluated and tested various vendor offerings for redundancy, management functionality and performance. The table below lists DriveScale certified products with model numbers.
Table 2: DriveScale Certified JBODs
JBOD Vendor Model Number
Dell PowerVault MD3060e - 3.5" and 2.5", 60 bays, 4U, redundant expanders, 2 x 3 x mini-SAS 6G
Dell Storage MD1280 - 3.5", 84 bays, 5U, redundant expanders, 2 x 3 x mini-SAS 6G
Hewlett Packard Enterprise D6020 - 3.5", 70 bays, 5U, quad expanders, 4 x 2 x mini-SAS 12G
D6000 - 3.5", 70 bays, 5U, quad expanders, 4 x 2 x mini-SAS 6G
RAID Inc ./Newisys NDS-4600/4603 - 3.5", 60 bays, 4U, redundant expanders, 2 x 4 x mini-SAS 6G
NDS-2241 - 2.5", 24 bays, 2U, redundant expanders, 2 x 3 x mini-SAS 6G
NDS-4900 - 3.5", 90/96 bays, 4U, redundant expanders, 2 x 6 x mini-SAS-HD 12G
NDS-4900 - 3.5", 84 bays, 4U, redundant expanders, 2 x 5 x mini-SAS-HD 12G
Quanta (QCT) M6400H - 3.5", 60 bays, 4U, redundant expanders, 2 x 4 x mini-SAS 6G
JB4602 - 3.5”, 60 bays, 4U, redundant expanders, 2 x 4 x mini-SAS 12G
Drivescale-MAPR
7 of 18©2018 DriveScale Inc. All Right Reserved.
R E F E R E N C E A R C H I T E C T U R E
JBOD Vendor Model Number
Promise Inc . J5300s - 3.5", 12 bays,2U, redundant expanders, 2 x 2 x mini-SAS-HD 12G
J5320s - 2.5",24 bays, 2U, redundant expanders, 2 x 2 x mini-SAS-HD 12G
J5600 - 3.5", 16 bays, 3U, redundant expanders, 2 x 2 x mini-SAS-HD 12G
J5800 - 3.5", 24 bays, 4U, redundant expanders, 2 x 2 x mini-SAS-HD 12G
Lenovo D3284 - 3.5”, 84 bays, 5U, redundant expanders, 3 x 4 x mini-SAS-DS 12G
HGST 4U60G2 - 3.5", 60 bays, 4U, redundant expanders, 2 x 4 x mini-SAS-HD 12G
Ultrastar Data102, 102 bays, 4U, redundant expanders, 2 x 6 x mini-SAS-HD 12G
IBM ESS JBOD Storage (5U84)- 3.5" and 2.5", 84 bays, 5U, redundant expanders, 2 x 3 x mini-SAS 12G
6. DriveScale-MapR Solution Overview
The DriveScale-MapR™ Big Data Solution is designed to address the complexity and silos that result from deploying different workloads on different clusters. The solution is designed with software defined composability as the primary goal. The composability lowers capex and opex costs, improves utilization by eliminating silos, and greatly simplifies the deployment of Big Data workload and analytics clusters.
Hadoop and other Apache projects are being developed in Java and other programming languages by a global community of contributors. Yahoo, who has been the largest contributor to this project, uses Apache Hadoop extensively across its businesses. Core committers on the Hadoop project include employees from MapR, Cloudera, eBay, Facebook, Getopt, Hortonworks, Huawei, IBM, InMobi, INRIA, LinkedIn, Microsoft, Pivotal, Twitter, UC Berkeley, VMware, WANdisco, Yahoo, and many more individuals and organizations.
Hadoop deployments, other Apache projects, and 3rd party compute engines and custom apps for Big Data workloads are widely popular, but installing, configuring, and running a production cluster has challenges, including:
• Choosing the appropriate Big Data software distribution and extensions
• Installing the monitoring and management software
• Allocating Big Data services to physical nodes
• Selecting appropriate server hardware
• Rightsizing the storage configuration
• Implementing data locality
Drivescale-MAPR
8 of 18©2018 DriveScale Inc. All Right Reserved.
R E F E R E N C E A R C H I T E C T U R E
• Designing the network fabric
• Sizing and scaling the system
• Managing overall performance
These concerns are complicated by the need to understand the workloads running on the cluster, keep up with the fast-moving pace of core Apache projects, and manage a system designed to scale to thousands of nodes in a single instance.
The DriveScale-MapR Big Data Solution embodies all the hardware, software, resources and services needed to run a Big Data deployment with a single solution in a production environment. This end-to-end solution is specifically designed to accelerate large scale production, while delivering the compute and storage performance needed. The solution components include the MapR Converged Data Platform, DriveScale software and hardware, industry standard servers, networks switches, and JBODs built with standard disk drives.
These components span the entire solution stack:
• Reference architecture
• Optimized storage configurations
• Optimized network infrastructure
• MapR Converged Data Platform
This solution is designed to address the vast majority of Big Data use cases including Apache Hadoop, but not limited to:
• Big data analytics
• ETL offload
• Data warehouse optimization
• Batch processing of unstructured data
• Big data visualization
• Search and predictive analysis
• Real-Time analytics and stream processing
7. DriveScale Components Overview
The DriveScale system is composed of the DriveScale Composer and Server Agents (software), DriveScale Central (a cloud service) and the DriveScale HDD Appliance (hardware).
7 .1 The DriveScale Composer, Server Agents and DriveScale Central
a) The DriveScale Composer
The DriveScale Composer is the heart of the DriveScale SCI solution. The Composer holds the inventory of all resources, composes clusters of compute and storage resources via simple GUI control, monitors and manages clusters, and returns resources to pools for re-use when workloads have finished running.
Drivescale-MAPR
9 of 18©2018 DriveScale Inc. All Right Reserved.
R E F E R E N C E A R C H I T E C T U R E
• The server running the Composer software is called the Composer node. A typical deployment consists of three Composer nodes in a clustered configuration for high availability (HA).
• The Composer contains the configuration and information database for:
3 Inventory: DriveScale Composer nodes, DriveScale HDD Appliances, network switches, JBODs, chassis, disk drives and composable server nodes.
3 Configuration: node templates, cluster templates, configured clusters
3 Composer Database: used as a message bus to communicate with the servers and drives.
b) DriveScale Server Agents
The DriveScale Server Agent is installed on all servers to be composed. The server Agent provides inventory to the DriveScale Composer and creates mappings between composed server nodes and disk drives.
c) DriveScale Central
DriveScale Central is a cloud-based portal that provides a repository for:
• Software distribution
• DriveScale keys
• Centralized log files
• User documentation
• License manager
7 .2 DriveScale HDD Appliance
The DriveScale HDD Appliance is a 1U appliance with adapters that connect to servers via 10Gb Ethernet interfaces and to JBOD’s via SAS interfaces. The HDD Appliance software allows JBODs to be mapped to servers and used as local drives.
Figure1: DriveScale HDD Appliance
Drivescale-MAPR
10 of 18©2018 DriveScale Inc. All Right Reserved.
R E F E R E N C E A R C H I T E C T U R E
7 .3 DriveScale Solution Conceptual Diagram
Figure 2: DriveScale Cluster Components
Drivescale-MAPR
11 of 18©2018 DriveScale Inc. All Right Reserved.
R E F E R E N C E A R C H I T E C T U R E
8. Reference Architecture Details
8 .1 Physical Cluster Components and Configuration List
Table 3: Cluster Physical Components List
Component Configuration Description Quantity
DriveScale HDD Appliance
DHCP, Jumbo frame enabled
1U appliance with adapters to connect servers via Ethernet and JBOD’s via SAS.
1
DriveScale HDD Appliance Controller
DHCP, Jumbo frame enabled
Provides the data network.
4 for each chassis
DriveScale Composer DriveScale Composer running on a VM
Manages and configures nodes and clusters, stores the inventory/configuration repository of each component.
Min 1, for HA 3 Composers are configured as master and slave
Servers Two socket CPU and memory per the individual Hadoop cluster requirements
Commodity x86 servers that house all the NodeManager compute instances and DriveScale Server Agents.
Min 1 Name nodes + 3 Data nodes
HDD for Servers Two drives configured in RAID 1
The internal drives should be used for OS install.
2 for each server
NICs Dual port 10 Gbps Ethernet NICs. The connector type depends on the network design; could be SFP+ or twinax.
Provides the data network
Min 1 for each server
JBOD Default configuration Houses the disk drives with dual IO controllers.
Min 1
HDD for JBOD Default Configuration Disk Drives to store data for the cluster.
Depending on the cluster requirements
ToR 10G switch LLDP, MLAG, Jumbo Frame 9K configured
Provides data network connectivity.
2 for each rack
Drivescale-MAPR
12 of 18©2018 DriveScale Inc. All Right Reserved.
R E F E R E N C E A R C H I T E C T U R E
Component Configuration Description Quantity
ToR 1G switch Default configuration Provides management network connectivity.
1 for each rack
MAPR installer MAPR installer server running on a VM
Manages configures and monitors the MAPR cluster.
1 for each environment
8 .2 Logical Cluster Topology
The minimum requirements to build out the cluster are:
• 1 Administrator
• 4 Data Nodes
• 1 DriveScale HDD Appliance
• 1 DriveScale Composer
• Two 10G Switches
• One 1G Switch
• 1 JBOD chassis with disk drives
• 1 MAPR server
This reference architecture is built on one administrator node and 5 data nodes with 1 JBOD and 60 drives of 1 TB HDD. The following table lists the configurations of the servers and number of drives used.
Table 4: Server configuration
Component Configuration Description Quantity
Controller / Administrator node
Two sockets 20 core CPU, 256GB RAM, 10GbE Intel NIC with two internal HDD for OS and eight high capacity HDD mounted from the JBOD.
Administrator node hosts the MapR node and agents with DriveScale Server Agents.
1
Data nodes Two sockets 16 core CPU, 256GB RAM, 10GbE Intel NIC with two internal HDD for OS and eight high capacity HDD mounted from the JBOD.
Data nodes house the MapR-FS Nodes, ZooKeeper’s, CLDB and YARN Node managers, any additional required services with DriveScale agents.
4
Drivescale-MAPR
13 of 18©2018 DriveScale Inc. All Right Reserved.
R E F E R E N C E A R C H I T E C T U R E
Notes:
- Customers with higher (or lower) compute needs can acquire bigger (or smaller) data nodes configured with CPU and memory that fits the specific requirements of their applications.
- Similarly, depending on the data requirements, customers can add or remove disk drives to match the specific needs of their applications.
- Due to MapR’s distributed metadata model, all five nodes can be used for data and processing.
8 .3 Physical Cluster Topology
Figure 3: DriveScale Lab Architecture with 1 HDD Appliance (4x Adapters in use), 1 JBOD, 1 Controller/Data Node and 4 Data Nodes
Drivescale-MAPR
14 of 18©2018 DriveScale Inc. All Right Reserved.
R E F E R E N C E A R C H I T E C T U R E
8 .4 Cluster Management
This section details the steps for setting up a DriveScale enabled Hadoop cluster using MapR installer Server.
8.4.1 Setting up the DriveScale Cluster
The following tasks must be completed to setup the DriveScale solution before installing MapR Installer Server or using an existing install of MapR Installer Server:
1. Rack and install the DriveScale HDD Appliance using the supplied documentation.
2. Rack and install the JBOD using the documentation provided by the vendor.
3. Rack and install the servers using the documentation provided by the vendor.
4. Create a RAID 1 or your RAID choice configuration for the internal HDD on the server and install the OS on all the other servers.
5. Install and configure the DriveScale Composer on a VM or a standalone server.
6. Set up the HDD Appliance configuration from the Composer.
7. Install and configure the DriveScale Server Agents on the master and data nodes.
8. Create master/data node and cluster template with required disk drives using the Composer.
9. Create the cluster from the template using the Composer.
10. Ensure that DriveScale cluster is up and running before proceeding ahead.
Figure 4: Logical Cluster status from the Composer UI
Figure 5: Logical cluster details overview from the Composer UI
Drivescale-MAPR
15 of 18©2018 DriveScale Inc. All Right Reserved.
R E F E R E N C E A R C H I T E C T U R E
Figure 6: Logical cluster server details from the Composer UI
8.4.2 Setting up MapR cluster
1. After the successful completion of the steps above, follow the MapR prerequisites to ensure all the requirements are met.
2. Install MapR installer Server using the MapR Installer guide.
3. Launch the MapR installer from the browser and follow the onscreen instructions to install the services required on your MapR cluster.
4. For this reference architecture only YARN + MapReduce services were set up.
Figure 7: Installed services details from MapR Installer after successful installation
Drivescale-MAPR
16 of 18©2018 DriveScale Inc. All Right Reserved.
R E F E R E N C E A R C H I T E C T U R E
Figure 8: Installed services details from MapR UI Dashboard
5. Ensure that the control and data nodes are up and running with the right assigned roles and storage.
Figure 9: Storage overview from MapR UI Dashboard
8 .5 Disk and Filesystem Layout
Node/Role Disk and Filesystem Layout Description
Management/Master/YARN Node managers
MapR-FS 2 TB drives are mounted from the JBOD’s
Drivescale-MAPR
17 of 18©2018 DriveScale Inc. All Right Reserved.
R E F E R E N C E A R C H I T E C T U R E
8 .6 DriveScale-MapR OS Supportability/Compatibility Matrix
Composer Server Nodes MapR
CentOS/RHEL 6.x X X -
CentOS/RHEL 7.x X X X
Ubuntu 14.04 X X X
Ubuntu 16.04 X X X
9. Rack Scalability
Customers can scale beyond one rack to expand their compute and storage resources as application needs grow. Compute-to-storage ratios can be changed or maintained for new or existing racks. For every new JBOD addition, a new DriveScale HDD Appliance with four controllers must be added as well. Since disk drives are assigned from within the rack to servers in the rack, scaling is achieved by simply adding more racks with Servers, DriveScale HDD Appliances, Switches and JBODs.
Figure 10: Rack Scalability
10. References
1. MapR Installer Prerequisites and Guidelines
http://maprdocs .mapr .com/home/AdvancedInstallation/c_install_prerequisites .html
2. MapR Installer setup
http://maprdocs .mapr .com/home/MapRInstaller .html
3. DriveScale documentation for racking and installation which are provided by DriveScale
4. YARN Definition
http://searchdatamanagement .techtarget .com
11. Bill of Materials
Server Components Quantity
Intel Xeon Processor based servers with dual or quad port 10GbE SFP+ NICs. The exact CPU models, number of sockets, and memory are based on customer application needs
Depends on customer application needs
JBOD Components Quantity
DriveScale certified JBODs Depends on customer application needs
NL-SAS HDDs Depends on customer application needs
Switch Quantity
DriveScale certified 10GbE SFP+ switches An even number of switches for redundant switch fabric
1GBaseT switch Based on the number of Servers and JBODs in configuration
DriveScale Components Quantity
DriveScale HDD Appliance One for each JBOD
DriveScale Adapter Four for each HDD Appliance
Software Version
CentOS Please refer to 8.6 section
DriveScale HDD Appliance 1.4
MapR 5.2
12. Conclusion
The DriveScale-MapR solution reference architecture guide is designed to provide an overview of the combined solution and the components employed in the solution. The reference architecture also outlines the advantages of compute and storage disaggregation with the DriveScale-MapR solution.
18 of 18
DriveScale, Inc 1230 Midas Way, Suite 210 Sunnyvale, CA 94085
Main: +1(408) 849-4651 www. drivescale.com
©2018 DriveScale Inc. All Right Reserved.
ra.20171218.002.Rev001
Drivescale-MAPR R E F E R E N C E A R C H I T E C T U R E