38
Moving to Nova Cells Without Destroying the World Mike Dorman @misterdorm Senior Systems Engineer, Go Daddy http://x.co/yvrcells

Moving to Nova Cells without Destroying the World

Embed Size (px)

Citation preview

Moving toNova CellsWithout Destroyingthe WorldMike Dorman @misterdormSenior Systems Engineer, Go Daddyhttp://x.co/yvrcells

CELLS INTRODUCTION

How to scale nova?

http://docs.openstack.org/openstack-ops/content/scaling.html

CELLS INTRODUCTION

Use cells to overcome …

• Large number of nova-computes• Single message queue instance• Complicated scheduling• Multi-site behind one API

3

CELLS INTRODUCTION

Cells defined

• Hierarchy of Nova instances• Each has database, message queue, scheduler, and

compute• Message routing between cells to perform operations• Top-level API cell for nova-api and cell scheduling• Overrides the default compute API class

• Lots of caveats• This is cells v1 (v2 in Liberty)4

CELLS INTRODUCTION

5

http://comstud.com/cells.pdf

CELLS INTRODUCTION

6

http://comstud.com/cells.pdf

CELLS INTRODUCTION

More details to get started

• Nova cells configuration reference• http://docs.openstack.org/juno/config-reference/content/section_compute-cells.htm

• Openstack-dev cells disucssions• http://www.gossamer-threads.com/lists/openstack/dev/16277

• CERN’s cells architecture• http://openstack-in-production.blogspot.com/2014/03/cern-cloud-architecture-update-for.html

• Folsom cells design summit slides• http://comstud.com/FolsomCells.pdf

• Exploring OpenStack Nova Cells• http://www.dorm.org/blog/exploring-openstack-nova-cells/

• Talks by Rackspace, CERN, NeCTAR

7

PLANNING THE CONVERSION

Goals

• Get to cells before scaling fire drill• Keep nova RMQ, DB close to compute

nodes• Maintain existing instances state• Little or no downtime

8

PLANNING THE CONVERSION

Basic plan

• Existing nova becomes first compute cell

• Split RMQ cluster• Create new nova instance for API cell• Import data to API cell• Existing nova-api service until final

cutover9

PLANNING THE CONVERSION

Basic plan

10

ENVIRONMENT PREP

Getting ready

• New servers for the API cell services• Database for nova API cell• Migrate non-nova services to new

machines• Network ACLs• Check DNS

11

ENVIRONMENT PREP

Extra credit: Split RabbitMQ cluster

• Not strictly necessary!• To minimize downtime and maintain

state• First add new nodes• Split and contract cluster

12

heat neutron glance

nova ceilometer

ENVIRONMENT PREP

Expand RabbitMQ cluster

13

Original RMQ/App Servers(to be: compute cell)

heat neutron glance

nova ceilometer

ENVIRONMENT PREP

Expand RabbitMQ cluster

14

Original RMQ/App Servers(to be: compute cell)

New RMQ/App Servers(to be: API cell)

heat neutron glance

nova ceilometer

ENVIRONMENT PREP

Expand RabbitMQ cluster

15

Original RMQ/App Servers(to be: compute cell)

New RMQ/App Servers(to be: API cell)

heat neutron glance

nova

ceilometer

ENVIRONMENT PREP

Reconfigure non-nova services

16

Original RMQ/App Servers(to be: compute cell)

New RMQ/App Servers(to be: API cell)

heat neutron glance

nova

ceilometer

ENVIRONMENT PREP

Split brain

17

Original RMQ/App Servers(to be: compute cell)

New RMQ/App Servers(to be: API cell)

heat neutron glance

nova

ceilometer

ENVIRONMENT PREP

Remove opposite nodes

18

Compute Cell Servers(Original RMQ/App Servers)

API Cell Servers(New RMQ/App Servers)

CONFIGURE COMPUTE CELL

Set up record for parent cell

nova-manage cell create \ --name=api --cell_type=parent \ --username=api_rmq_user --password=api_rmq_pass \ --hostname=api_rmq_host --virtual_host=api_rmq_vhost

• Use the API cell RMQ servers!• Or use cells_config option and put this in json

http://docs.openstack.org/juno/config-reference/content/section_compute-cells.html#cell-config-optional-json

19

CONFIGURE COMPUTE CELL

20

http://comstud.com/cells.pdf

CONFIGURE COMPUTE CELL

Enable nova-cells in compute cell

[cells]enable = truename = cell_01cell_type = compute

• Start up nova-cells, verify connections to RMQ• Do not restart nova-api after this!

21

CONFIGURE COMPUTE CELL

Disable quotas in compute cell

• Quotas will be enforced by the API cell

[DEFAULT]quota_driver=nova.quota.NoopQuotaDriver

22

BOOTSTRAP NOVA FOR API CELL

Install & configure nova as usual

• Install packages, db sync• Use the API cell RMQ servers!• Configure cells options

[cells]enable = truename = apicell_type = api

• Don’t start services yet (need to import data)23

BOOTSTRAP NOVA FOR API CELL

Set up record for child cell

nova-manage cell create \ --name=cell_01 --cell_type=child \ --username=comp_rmq_user --password=comp_rmq_pass \ --hostname=comp_rmq_host --virtual_host=comp_rmq_vhost

• Use the compute cell RMQ servers!• Remember cells_config/json option

24

BOOTSTRAP NOVA FOR API CELL

25

http://comstud.com/cells.pdf

IMPORT NOVA DATA

Seed API cell data

• API cell needs flavor, quota, instance, etc. data• Must do this directly in SQL• Shut down nova-api to prevent changes while you do this

mysqldump nova_orig_db table_name | \mysql nova_api_cell_db

26

IMPORT NOVA DATA

Tables to import

• instance_types• instance_type_extra_specs• instance_type_projects• instances• instance_info_caches• block_device_mapping• instance_system_metadata• instance_groups• instance_group_member• instance_group_metadata• instance_group_policy

• key_pairs• quota_classes• quota_usages• quotas• snapshots• snapshot_id_mappings• virtual_interfaces• volumes• May be others you need!

27

RESTART SERVICES

Start up all nova services

API Cell• nova-cells• nova-api• nova-consoleauth *• nova-spicehtml5proxy• nova-serialproxy

28 * http://blog.mgagne.ca/nova-cells-and-console-access/

Compute Cell• nova-cells• nova-cert• nova-conductor• nova-console• nova-scheduler• nova-network• nova-compute• (Maybe nova-api)

CAVEATS

YMMV

nova-cells is considered experimental

Test it!

So it won’t blow up in your face!

29

CAVEATS

Things that just don’t work

• Neutron vif plugging notifications to novavif_plugging_is_fatal = falsevif_plugging_timeout = 5

(But this causes a race condition)

• Any notifications between cells and other servicesceilometerhttp://openstack-in-production.blogspot.com/2014/03/cern-cloud-architecture-update-for.html

30

CAVEATS

Things that just don’t work

• nova cells-list “circular reference detected” bughttps://bugs.launchpad.net/nova/+bug/1312002https://review.openstack.org/#/c/106991/2/nova/cells/state.py

• Console AuthMake sure to set cells/enable=true on all node typeshttp://blog.mgagne.ca/nova-cells-and-console-access/

31

CAVEATS

Some objects are not cell-aware

• Flavors and Server GroupsMust exist in API cell and compute cell DB (with same IDs!)https://github.com/NeCTAR-RC/nova/commit/5abc8847dc89b162b6ae678176a5cfe4989144a9

• Block Deviceshttp://blog.mgagne.ca/nova-cells-and-block-device-mapping/

• Security groups• ???

32

CAVEATS

Host aggregates and availability zonesnova-api server read cell state from DB:

https://github.com/NeCTAR-RC/nova/commit/6fe7057fb4957485d3bac06579ddc38c93458064

Add AZ support for cells:https://github.com/NeCTAR-RC/nova/commit/048bd2d6d438fb8fa9ad7d3e0d57e7d03c546f6f

Support aggregate API in cells:https://github.com/NeCTAR-RC/nova/commit/8ca8828d191bc271460eb80567717fd15ef6167c

Ability to filter cells capacity report:https://github.com/NeCTAR-RC/nova/commit/97921ef1010c5e5bca357d77682bd0ee42d6ffcc 

Print cell name in cell timeout exceptions:https://github.com/NeCTAR-RC/nova/commit/60f669ba1ed5221d71138a72fb2cf3b34c07a970 

Use sysmetadata to get instances AZ in API cell:https://github.com/NeCTAR-RC/nova/commit/95e4cccac623c601e074a618ea71d121a359e00f 

Use sysmetadata to get instance_name in API cell:https://github.com/NeCTAR-RC/nova/commit/6bf1cf78b86bed99733e1119b891397dee15a65e 

33

FOSS FTW!

Thanks!

34

CAVEATS

Other issues

• nova.cells.messaging errorsnova.cells.messaging OperationalError: (OperationalError) (1048, "Column 'instance_uuid' cannot be null") 'UPDATE instance_extra SET updated_at=%s, instance_uuid=%s WHERE instance_extra.id = %s’

No clue on this, but doesn’t seem to break anything

• Database consistency between API and compute cellsCommunication interruption between cells can cause thisUse case for running nova-api in compute cells

35

CELLS V2

A better way forward for nova

• Cells is the default mode• No nova-cells service• nova-api calls directly to each cell’s DB and

message queue

https://wiki.openstack.org/wiki/Nova-Cells-v2https://etherpad.openstack.org/p/kilo-nova-cells-manifesto

36

CELLS V2

Give me Liberty or give me death!

• Experimental in Liberty• Transition from no cells v2 should be seamless• Unclear how cells v1 will migrate to v2

• Unless you really need to go to cells right now …… wait for Liberty

37

38

Thank you!

@misterdormFreenode: [email protected]

http://x.co/yvrcells