19
Abstract Tile: Intel Rack Scale Architecture This talk provides an overview of Intel Rack Scale Architecture and discusses how this architecture addresses underutilized and stranded resources in a Data center through resource pooling! We will also specifically discuss concept of a pooled system, storage node, pooling of PCIe as well as NVMe based storage. The impact of pooling on latency, radix and failure domains will also be discussed. Further pooling introduces a need for composition of the platform. We will also discuss the characteristics of such platform composition how software can emerge to take advantage of these capabilities!

Intel® Rack Scale Architecture · Tile: Intel Rack Scale Architecture ... well as NVMe based storage. • The impact of pooling on latency, radix and failure domains will also be

Embed Size (px)

Citation preview

Page 1: Intel® Rack Scale Architecture · Tile: Intel Rack Scale Architecture ... well as NVMe based storage. • The impact of pooling on latency, radix and failure domains will also be

Abstract

Tile: Intel Rack Scale Architecture

• This talk provides an overview of Intel Rack Scale Architecture and discusses how this architecture addresses underutilized and stranded resources in a Data center through resource pooling!

• We will also specifically discuss concept of a pooled system, storage node, pooling of PCIe as well as NVMe based storage.

• The impact of pooling on latency, radix and failure domains will also be discussed.

• Further pooling introduces a need for composition of the platform. We will also discuss the characteristics of such platform composition how software can emerge to take advantage of these capabilities!

Page 2: Intel® Rack Scale Architecture · Tile: Intel Rack Scale Architecture ... well as NVMe based storage. • The impact of pooling on latency, radix and failure domains will also be

Intel Confidential

Rack Scale Architecture Mohan J. Kumar

Intel Fellow, Data Center Group

Page 3: Intel® Rack Scale Architecture · Tile: Intel Rack Scale Architecture ... well as NVMe based storage. • The impact of pooling on latency, radix and failure domains will also be

Data Center Challenges Infrastructure has not kept up with increasing business demands

3

Business Needs • Reduce operational and capital expenses. • Deliver new services in minutes, not months. • Optimize data center based on real-time analytics. • Address application workload needs with agility. • Scale capacity without interruption.

Less than 50% server utilization2

Inefficiency

Data growth doubles every 18 months1

Growth

Agility New services can take a week or more to provision1

1 Worldwide and Regional Public IT Cloud Services 2013–2017 Forecast. IDC (August 2013) 2 IDC’s Digital Universe Study, sponsored by EMC, December 2012

Page 4: Intel® Rack Scale Architecture · Tile: Intel Rack Scale Architecture ... well as NVMe based storage. • The impact of pooling on latency, radix and failure domains will also be

4

Today’s Architecture

• Proprietary and preconfigured

• Upgrade as a system

• Limited flexibility

Page 5: Intel® Rack Scale Architecture · Tile: Intel Rack Scale Architecture ... well as NVMe based storage. • The impact of pooling on latency, radix and failure domains will also be

5

What’s Next? A seismic shift in how data centers are built and managed—powered by Intel

All infrastructure delivered as a service

Resources automatically tuned to application workloads

Hyper-scalable to keep up with business demands

Page 6: Intel® Rack Scale Architecture · Tile: Intel Rack Scale Architecture ... well as NVMe based storage. • The impact of pooling on latency, radix and failure domains will also be

6

Software Defined Infrastructure Software Defined Infrastructure

SAN

NAS

Network Appliance

Telco Appliance

Dedicated Appliances

Page 7: Intel® Rack Scale Architecture · Tile: Intel Rack Scale Architecture ... well as NVMe based storage. • The impact of pooling on latency, radix and failure domains will also be

7

Simplified Platform Management

Disaggregate

Pool & Compose

Intel® Rack Scale Architecture Logical architecture for efficiently building and managing cloud infrastructure—and providing the simplest path to a software defined data center.

User-Defined Performance

Maximum Utilization

Interoperable Solutions

Increase performance per TCO$ & accelerate cloud adoption

Page 8: Intel® Rack Scale Architecture · Tile: Intel Rack Scale Architecture ... well as NVMe based storage. • The impact of pooling on latency, radix and failure domains will also be

8

Rack Scale – Architecture Framework

8

Modular scalable management architecture

1. Pooled systems 2. Network fabric management 3. Pod-wide storage 4. Pod management

Pod-wide Management

Scalability

v

Pod wide storage Network fabric

Page 9: Intel® Rack Scale Architecture · Tile: Intel Rack Scale Architecture ... well as NVMe based storage. • The impact of pooling on latency, radix and failure domains will also be

9

Management Software Framework Flexible management architecture allowing for range of implementation options

• Asset & location discovery • Disaggregated resource management • Composable system support

• Support compute, network, and storage

Comprehensive management architecture v

Network Switch

Rack Scale API

Rack Scale POD Management API

Page 10: Intel® Rack Scale Architecture · Tile: Intel Rack Scale Architecture ... well as NVMe based storage. • The impact of pooling on latency, radix and failure domains will also be

Rack Scale Pooled System Platform

• Supports range of server processors

• Pooled NVM

• Supports Ethernet fabric

• Service Model requires Full node replacement

• Supports range of server processors • Direct Attach Storage • Pooled NVM • Supports Ethernet fabric • Redundant networks, scalable DAS storage, sub

node FRUs

Storage Node Compute Node

Page 11: Intel® Rack Scale Architecture · Tile: Intel Rack Scale Architecture ... well as NVMe based storage. • The impact of pooling on latency, radix and failure domains will also be

Rack Scale Pooled NVMe Controller (PNC)

Assign Drive2 to Node1

Server Node 1

Drive 2

Logical effect of the assignment

Server Node 2

Drive 1

Pooled NVMe

Controller

… Drive 1 Drive 2 Drive n

… Server Node 1

Server Node 2

Server Node n

PSME Mgt Port

Drive 3

Drive 3

Assign Drive1, Drive 3 to Node 2

Page 12: Intel® Rack Scale Architecture · Tile: Intel Rack Scale Architecture ... well as NVMe based storage. • The impact of pooling on latency, radix and failure domains will also be

Rack Scale Pooled NVMe Controller (PNC)

Assign ½ capacity of Drive1 to Node1

Pooled NVMe

Controller

… … Drive 1 Drive 2 Drive n

… Server Node 1

Server Node 2

Server Node n

PSME Mgt Port

Server Node 1

Drive 1’

Logical effect of the assignment

Server Node 2

Drive 1”

Capacity: ½ TB Capacity: ½ TB

Capacity: 1TB

Assign ½ capacity of Drive1 to Node2

Page 13: Intel® Rack Scale Architecture · Tile: Intel Rack Scale Architecture ... well as NVMe based storage. • The impact of pooling on latency, radix and failure domains will also be

13

Local vs. Pooled Storage

Avg IOPS/SVR < Local Capacity IOPS/SVR > Local Capacity

Pool

ing

Valu

e

Consolidation of Storage

Consolidation of compute

Pooled Infrastructure cost

Op Ex Value of pooling

less than capacity deployed

More than capacity deployed

WL/SVR Local Deployment model?

Avg IOPS/SVR = Local Capacity

Diverse Workload deployment

Page 14: Intel® Rack Scale Architecture · Tile: Intel Rack Scale Architecture ... well as NVMe based storage. • The impact of pooling on latency, radix and failure domains will also be

Rack Scale Pooled NVMe Controller (PNC) • Enable pooling and disaggregation of PCIe devices

away from compute / storage nodes

• Enable disaggregation of PCIe devices including Storage, FPGA

• Assign high performance storage to nodes based on workload demand

• Allow full and partial drive assignment

• Prevent SPOF through host failover

• Enables ease of workload migration in hyperscale environment

• Enables better utilization of Data Center resources by allowing composable high performance IO capacity

Pooled NVMe

Controller

… Drive 1 Drive 2 Drive n

… Server Node 1

Server Node 2

Server Node n

PSME Mgt Port

Drive 3

Page 15: Intel® Rack Scale Architecture · Tile: Intel Rack Scale Architecture ... well as NVMe based storage. • The impact of pooling on latency, radix and failure domains will also be

Rack Scale Pooled NVMe Controller (PNC) • Enable disaggregation of NVM Express

devices

• Utilizes NVMeOF to expand radix of pooling

• Assign storage to Compute or Storage nodes based on workload demand

• Prevent SPOF through host failover

• Enables ease of workload migration in hyperscale environment

• Enables better utilization of DC resources through composition

PNC

Management link

Server Node 1

PSME

Server Node 2

Server Node n

Drive 1 Drive 2 Drive n

Ethernet

Page 16: Intel® Rack Scale Architecture · Tile: Intel Rack Scale Architecture ... well as NVMe based storage. • The impact of pooling on latency, radix and failure domains will also be

16

Local vs. Pooled Resources Resource Local to a Node Pooled Resource

Device Attach and Capacity

Limited by Physical constraints

Not constrained by node volumetrics

Device Availability (Failure Domain)

Node is SPOF Pooled Fabric

Utilization Limited by Local use No stranded capacity/capability

Latency Local Access Incurs Additional Pooling Latency

Radix Local Limited by Pooling Fabric (one rack to multiple racks)

Refresh and Life Cycle Management

Node based Component based

Page 17: Intel® Rack Scale Architecture · Tile: Intel Rack Scale Architecture ... well as NVMe based storage. • The impact of pooling on latency, radix and failure domains will also be

Composable Infrastructure – Software Implications

17

Orchestration that comprehends composition capabilities

Location aware placement of workloads

Location aware placement in hyperscale storage

Monitoring software that knows the physical bounds of the hardware

Software (OS, VMM, App) capability to take advantage of dynamically added resources

Composed System 1

Composed System n

Compose

Release

Pool of Resources

Page 18: Intel® Rack Scale Architecture · Tile: Intel Rack Scale Architecture ... well as NVMe based storage. • The impact of pooling on latency, radix and failure domains will also be

Summary

18

INTEROPERABLE SOLUTIONS

MAXIMUM UTILIZATION

• Tailor performance to meet application SLAs by selecting from pooled compute, storage & network resources

• Easily scale capacity with modular, buy-as-you-go architecture

• Autonomously manage compute, network & storage pools to virtually eliminate stranded resources

• Interoperable system architecture simplifies data center operations and integration of multi-vendor solutions

USER-DEFINED PERFORMANCE

“Buy What You Need. Use What You Buy”

Page 19: Intel® Rack Scale Architecture · Tile: Intel Rack Scale Architecture ... well as NVMe based storage. • The impact of pooling on latency, radix and failure domains will also be

Intel Confidential — Do Not Forward