32
Ver1.1 2013/02/10 Etsuji Nakai NII dodai-compute2.0 project Deploying Baremetal Instances with OpenStack

Deploying Baremetal Instances with OpenStack

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Deploying Baremetal Instances with OpenStack

Ver1.1 2013/02/10Etsuji Nakai

NII dodai-compute2.0 project

Deploying Baremetal Instanceswith OpenStack

Page 2: Deploying Baremetal Instances with OpenStack

2

NII dodai-compute2.0 project

$ who am i

– The author of “Professional Linux Systems” series.• Available only in Japanese. Translation offering from publishers are welcomed ;-)

Professional Linux SystemsTechnology for Next Decade

Professional Linux SystemsDeployment and Management

Professional Linux SystemsNetwork Management

Etsuji Nakai– Senior solution architect and cloud

evangelist at Red Hat.– Working for NII (National Institute of

Informatics Japan) as a cloud technology consultant.

Page 3: Deploying Baremetal Instances with OpenStack

Background of the project

Page 4: Deploying Baremetal Instances with OpenStack

4

NII dodai-compute2.0 project

Why does baremetal matter?

General usecase– I/O Intensive application (RDB)– Realtime application (Deterministic latency)– Native Processor Features– etc....

Specific usecase in “Academic Research Cloud (ARC)” of NII– Flexible extension of existing server cluster.– Flexible extension of existing cloud infrastructure.

Page 5: Deploying Baremetal Instances with OpenStack

5

NII dodai-compute2.0 project

Academic Research Cloud (ARC) in NII, today.

This is a prototype of the Japan-wide research cloud.– It's now running in NII's laboratories, and will be extended as a Japan-wide research cloud.

Research labs can extend their existing clusters (HPC cluster, cloud infrastructures, etc...) by attaching baremetal servers from the resource pool.

・・・・・・

・・・・・・

Existing HPC Cluster

Existing Cloud Infrastructure

L2 connection(VLAN)

Baremetal Resource Pool

・・・

On-demand provisioning/ de-provisioning Flexible extension of

existing cluster

Self Service Portal

Page 6: Deploying Baremetal Instances with OpenStack

6

NII dodai-compute2.0 project

Future plan of the ARC.

ARC will be extended as a Japan-wide cloud with SINET4 WAN connection.– SINET4 is a MPLS based wide area Ethernet service for academic facilities in Japan, operated

by NII.

・・・・・・

・・・・・・

Existing HPC Cluster

Existing Cloud Infrastructure

Baremetal Resource Pool

・・・

http://www.sinet.ad.jp/index_en.html MPLS basedWide Area Ethernet

Page 7: Deploying Baremetal Instances with OpenStack

7

NII dodai-compute2.0 project

Overview of dodai-compute1.0

What is dodai-compute?– Baremetal driver extension of Nova, currently used in ARC.

• Designed and developed by NII in 2012• Based on Diablo with Ubuntu 11.10 • Source codes – https://github.com/nii-cloud/dodai-compute

– Upside: Simple extension aimed for the specific usecase :-)

– Downside: Unsuitable for general usecase :-(• Cannot manage mixed environment of baremetal and hypervisor hosts.• One-to-one mapping from instance flavor to baremetal host. (No scheduling

logic to select suitable host automatically.)• Nonstandard use of availability zone. (Used for host status management.)

The most outstanding issue -It's not merged in upstream. No community support,

No future!

Page 8: Deploying Baremetal Instances with OpenStack

8

NII dodai-compute2.0 project

Planning of ARC baremetal provisioning feature

It should be designed based on the framework in the upstream.– Existing framework: GeneralBareMetalProvisioningFramework.

• So called “NTTdocomo-openstack.”• Blueprint - http://wiki.openstack.org/GeneralBareMetalProvisioningFramework• Source codes - https://github.com/NTTdocomo-openstack/nova

As a first step, we compared the architectures of “dodai-compute” and “NTTdocomo-openstack”, and considered the following things.

– What's common and what's uncommon?– What can be more generalized in “NTTdocomo-openstack”?– What should be added to be used for ARC?

The goal of the project “dodai-compute2.0” is - Extend the upstream framework for ARC.- Not to be a private branch, stay in the upstream.

Note:– NTTdocomo-openstack branch has been merged in the upstream with many modifications. Although this slide is

based on NTTdocomo-openstack branch, the future extension will be done directly on the upstream.

Page 9: Deploying Baremetal Instances with OpenStack

9

NII dodai-compute2.0 project

By the way, what does “dodai” stand for?

1. Base, Foundation, Framework, etc... 2. A sub flight system (SFS) featured in Mobile Suit Gundam.

Page 10: Deploying Baremetal Instances with OpenStack

Comparison of dodai-compute1.0 and NTTdocomo-openstack

Page 11: Deploying Baremetal Instances with OpenStack

11

NII dodai-compute2.0 project

Today's Topics

1. Coupling Structure with Nova Scheduler.2. OS Provisioning Mechanism.3. Network Virtualization.

Page 12: Deploying Baremetal Instances with OpenStack

Coupling Stricture with Nova Scheduler

Page 13: Deploying Baremetal Instances with OpenStack

13

NII dodai-compute2.0 project

General flow of instance launch

Compute DriverSelect host

for new instance

VM

Nova Scheduler

Compute Driver

VM

VM

・・

VM

Asks to launch instanceLaunch VM

Register hoststo scheduler

Question:– How can we apply baremetal servers in place of VM instances in

this structure?

Page 14: Deploying Baremetal Instances with OpenStack

14

NII dodai-compute2.0 project

A1. Register “Baremetal Pool” as an “Instance Host”

Compute Driver

Nova Scheduler

Launch baremetal

server

Asks tolaunch instance

Compute Driver

Select poolfor new instance

Baremetal Pool

Select baremetal serverto launch

Baremetal Pool

dodai-compute takes this approach. Its driver acts as a single host which accommodates multiple baremetal servers.

Register pools

Page 15: Deploying Baremetal Instances with OpenStack

15

NII dodai-compute2.0 project

A2. Register each baremetal as a “Single Instance Host”

Compute DriverNova Scheduler

Launch selected baremetal server

Asks tolaunch instance

Select baremetal serverfor new instance

NTTdocomo-openstack takes this approach. Its driver acts as a proxy for baremetal servers, each of them accommodates just one instance.

Register each baremetalas host

Page 16: Deploying Baremetal Instances with OpenStack

16

NII dodai-compute2.0 project

Class structure for coupling with Nova

dodai-compute1.0 and NTTdocomo-openstack has basically the same class structure in terms of coupling with Nova.

– The drawing is the case of dodai-compute1.0– NTTdocomo-openstack uses “BareMetalDriver” in place of “DodaiConnection”

https://github.com/nii-cloud/dodai-compute/wiki/Developer-guide

Base class of different kindsof visualization hosts

Driver for libvirt managedhypervisor (KVM/LXC)

Driver for baremetal management

Page 17: Deploying Baremetal Instances with OpenStack

17

NII dodai-compute2.0 project

How does Nova Scheduler see baremetal servers?

dodai-compute's driver acts as a single host which accommodates multiple baremetal servers.

– It's like representing a baremetal pool as a single “Host” which runs baremetal servers as its “VM's”.

– Scheduling policy is implemented in the driver side. (Nova Scheduler has no choice of hosts.)

Nova APINova Scheduler

Nova Compute(dodaiConnection)

Scheduler recognizesit as a single host

dodai db(Baremetalserverinformation)

・・・

Choose host to provision by referring to dodai db

A host of “baremetal VM's”

Page 18: Deploying Baremetal Instances with OpenStack

18

NII dodai-compute2.0 project

How does Nova Scheduler see baremetal servers?

NTTdocomo-openstack driver acts as a proxy of all baremetal hosts.– Each baremetal server is seen as an independent host which can accommodate up to

one instance.– Scheduling policy is implemented as a part of Nova Scheduler. It uses "extra_specs”

metadata to distinguish baremetal hosts from hypervisor hosts.

Nova APINova Scheduler

Nova Compute(BareMetalDriver)

Scheduler recognizesall baremetal hosts

・・・

Register all hosts byreferring to baremetal db

Hosts of just one instance beremetal db (Baremetalserverinformation)

extra_specs=cpu_arch:x86_64

Page 19: Deploying Baremetal Instances with OpenStack

19

NII dodai-compute2.0 project

Considerations on the Nova Scheduler coupling

dodai-compute– Scheduling (server selection logic) is up to the driver.

• Currently, there's no intelligence in the driver's scheduler. One-to-one mappings between physical servers and instance types are pre-defined.

• However, it enables users to choose a baremetal server explicitly.

NTTdocomo-openstack– Scheduling (server selection logic) is up to Nova Scheduler.

• Currently, the standard “Filter Scheduler” is used.• “instance_type_extra_specs=cup_arch:x86_64” is used to distinguish

baremetal hosts from hypervisor hosts.• Users cannot choose a baremetal server to use explicitly.

This must be addressed for ARC usecase. We may use additional “labels” in instance_type_extra_specs, like,“instance_type_extra_specs=cpu_arch:x86_64,racklocation:a32”

Page 20: Deploying Baremetal Instances with OpenStack

OS Provisioning Mechanism

Page 21: Deploying Baremetal Instances with OpenStack

21

NII dodai-compute2.0 project

OS Installation Mechanism of dadai-compute1.0

The basic flow of OS installation in dodai-compute1.0– Management IP (IPMI) of baremetal servers are stored in database.– The driver prepares a boot image and an installation script.– The actual installation works are handled by the script.

BaremetalServer

OS InstallationServer

PXEBootServer

pxe boot image

(2) Pass installation scriptURL as a kernel parameter

BareMetalDriver

(1) Fetch the target image from Glance(tar ball of root file system contents),And prepare the installation script.

(3) Fetch the installation script and run it.

(4) Fetch the image tar ball,and expand it into the local disk

Page 22: Deploying Baremetal Instances with OpenStack

22

NII dodai-compute2.0 project

OS Installation Mechanism of NTTdocomo-openstack

The basic flow of OS installation in NTTdocomo-openstack.– Management IP (IPMI) of baremetal servers are stored in database.– The driver prepares a boot image and an installation script.– The actual installation works are handled by the script.

BaremetalServer

OS InstallationServer

PXEBootServer

pxe boot image

(2) Embed installation scriptinto the init script

BareMetalDriver

(1) Fetch the target image from Glance(dd image of root filesystem),And prepare the installation script.

(3) export local disk as an iSCSI LUN,and ask installation service to fill it.

(4) Attache the iSCSI LUN,and fill it with the dd image.

Page 23: Deploying Baremetal Instances with OpenStack

23

NII dodai-compute2.0 project

OS Installation Mechanism

– Management IP (IPMI) of baremetal servers are stored in database.– The driver prepares a pxe boot image to start OS installation.– The actual installation works are handled by scripts in the boot image.

The difference just lies on the actual installation method.– Installation script of dodai-compute1.0:

• Make partitions and filesystems on the local disk.• Fetch tar.gz image and unbundle it directly to the local filesystem.• Install grub to the local disk.

– Installation script of NTTdocomo-openstack:• Start tgtd (iSCSI target daemon) and export the local disk as an iSCSI LUN.• Ask the external “Installation Server” to install OS in that LUN.• The installation server attaches the LUN and copy “dd” image to it.• Grub is not installed. The baremetal relies on PXE boot even for bootstrapping

of OS provisioned in the local disk.

The basic framework isthe same for both of them.

So,...

Page 24: Deploying Baremetal Instances with OpenStack

24

NII dodai-compute2.0 project

Considerations on OS Installation Mechanism

Registered machine images need to have meta-data to specify:– Type of Installation Service– Installation service's FQDN

• We may use “properties attribute” of the image.

BaremetalServer

OS InstallationServer A

OS InstallationServer B

PXEBootServer

pxe boot image/initrd script for the selected

installation service

(2) Prepare PXE boot imagecorresponding to the

selected installation service

BareMetalDriver

(1) Prepare the targetImage in the corresponding

installation service

(3) Script in initrd starts the installationusing the selected installation service.

We could give more general frameworkwhich allows multiple installation methods.

Page 25: Deploying Baremetal Instances with OpenStack

25

NII dodai-compute2.0 project

Considerations on OS Installation Mechanism

Candidates of Installation Service:– Existing ones such as in dodai-compute and NTTdocomo-openstack.– We'd like to add Kickstart method, too.

• The image contains a ks.cfg file instead of an actual binary image.• The installation service install the baremetal using Kickstart.

Kickstart gives more flexibility and ease of usefor customizing image contents.

Page 26: Deploying Baremetal Instances with OpenStack

Network Virtualization

Page 27: Deploying Baremetal Instances with OpenStack

27

NII dodai-compute2.0 project

Network configuration of dadai-compute1.0

L2 separation is done by VLAN.– Each lab has its own fixed VLAN ID assigned on SINET4.– dodai-compute asks OpenFlow controller to setup a

port/VLAN mapping. VLAN is explicitly specified by a user.

– Mappings between baremetal's NICs and associated switch ports are stored in database.

OS side configuration is done by the local agent.– NIC bonding is also configured for redundancy.– NIC bonding is mandatory in ARC.

BaremetalServer

Service NetworkSwitch #1

Management Network

dodai-compute

Management IP(Fixed)

Service IP

Service NetworkSwitch #2

bonding

PXE Boot /Agent Operations

SINET4

Service IP and Bonding config isdone by local agent based on

the request from dodai-compute

VLAN Trunking

OpenFlowController

Port/VLAN mapping

Page 28: Deploying Baremetal Instances with OpenStack

28

NII dodai-compute2.0 project

Network configuration of NTTdocomo-openstack

Virtual Network is managed by Quantum API and NEC OpenFlow Plug-in.

– L2 separation is done port-based packet separation using flowtable entries.

– Mappings between baremetal's NICs and associated switch ports are stored in database.

– VLAN based separation needs to be added for ARC usecase.

– When a user specifies more than two NICs, the driver choose unused NICs from the database and setup the flowtable entries for associated ports.

– NIC bonding mechanism needs to be added for ARC usecase.

BaremetalServer

Service NetworkSwitch

Management Network

BaremetalDriver

Management IP(Fixed)

Service IP

PXE Boot

OpenFlowController

Page 29: Deploying Baremetal Instances with OpenStack

29

NII dodai-compute2.0 project

How will Quantum API be used for ARC usecase?

Using Quantum API and plugin is a preferable choice for ARC. But we need some modification/extension, too.

VLAN based separation needs to be added for ARC usecase.

– Our plan is to add BareMetal VLAN plugin which configures port/VLAN mappings using flowtable entries, or directly configures port-VLAN on CISCO switches.

– This enables us not only SINET4 VLAN connection but also interconnection with VM instances using OVS plugin(via VLAN).

NIC bonding mechanism needs to be added for ARC usecase.

– As all NICs of baremetal servers are registered in database, we may add redundancy information there. (eg. NIC-A should be paired with NIC-B for bonding.)

– We may still need a local agent to make actual bonding configuration. Hypervisor Host

OVS Plugin

Port VLAN

Baremetal Server

Service NetworkSwitch #1

Service NetworkSwitch #2

SINET4

VLAN Trunking

VLAN Trunking

BareMetalVLAN Plugin

Page 30: Deploying Baremetal Instances with OpenStack

Summary

Page 31: Deploying Baremetal Instances with OpenStack

31

NII dodai-compute2.0 project

Summary

Target areas for the future extension:

1. Scheduler extension for grouping of baremetal servers.– Allowing users to specify baremetal servers to be used.

2. Multiple OS provisioning method.– Allowing multiple types of OS images such as:

• dd-image (NTTdocomo-openstack style)• tar ball (dodai-compute style)• Kickstart installation (new feature)

3. Baremetal Quantum plugin for VLAN inter-connection.– Allowing inter-connection to existing VLAN networks.– Allowing NIC-bonding configuration.

As NTTdocomo-openstack branch has been merged in the upstream, the future extension will be done directly on the upstream.

Page 32: Deploying Baremetal Instances with OpenStack

Etsuji NakaiTwitter @enakai00

NII dodai-compute2.0 project

Thank You!