18
Slide 1/18 IBM Research Lilliput meets Brobdingnagian: Data Center Systems Management through Mobile Devices Jan Rellermeyer, Thomas Osiecki, Michael Kistler, Ahmed Gheith 3 rd International Workshop on Dependability of Clouds, Data Centers and Virtual Machine Technology (DCDV) Held in conjunction with Dependable Systems and Networks (DSN) Budapest, Hungary June 18, 2013 Saurabh Bagchi, Fahad Arshad

Slide 1/18 IBM Research Lilliput meets Brobdingnagian: Data Center Systems Management through Mobile Devices Jan Rellermeyer, Thomas Osiecki, Michael Kistler,

Embed Size (px)

Citation preview

Page 1: Slide 1/18 IBM Research Lilliput meets Brobdingnagian: Data Center Systems Management through Mobile Devices Jan Rellermeyer, Thomas Osiecki, Michael Kistler,

Slide 1/18IBM Research

Lilliput meets Brobdingnagian: Data Center Systems Management through

Mobile Devices

Jan Rellermeyer, Thomas Osiecki, Michael Kistler, Ahmed Gheith

3rd International Workshop on Dependability of Clouds, Data Centers and Virtual Machine Technology (DCDV)

Held in conjunction with Dependable Systems and Networks (DSN)Budapest, Hungary June 18, 2013

Saurabh Bagchi,

Fahad Arshad

Page 2: Slide 1/18 IBM Research Lilliput meets Brobdingnagian: Data Center Systems Management through Mobile Devices Jan Rellermeyer, Thomas Osiecki, Michael Kistler,

Slide 2/18IBM Research

System Management Workflow

Something is wrong!

Pa

tc

h

Page 3: Slide 1/18 IBM Research Lilliput meets Brobdingnagian: Data Center Systems Management through Mobile Devices Jan Rellermeyer, Thomas Osiecki, Michael Kistler,

Slide 3/18IBM Research

Systems Management: A Changed View

FilteringPatch

Page 4: Slide 1/18 IBM Research Lilliput meets Brobdingnagian: Data Center Systems Management through Mobile Devices Jan Rellermeyer, Thomas Osiecki, Michael Kistler,

Slide 4/18IBM Research

So What Exactly Are the Changes?

1. Platform being used for doing the systems management

Server Mobile devices

1. Large screen2. Resource rich3. Within organization’s

security perimeter4. High dependability

1. Small screen2. Resource constrained3. Outside organization’s

security perimeter4. Lower dependability

Page 5: Slide 1/18 IBM Research Lilliput meets Brobdingnagian: Data Center Systems Management through Mobile Devices Jan Rellermeyer, Thomas Osiecki, Michael Kistler,

Slide 5/18IBM Research

So Exactly Are the Changes?

2. Layered systems management to flat hierarchy

Filtering

Page 6: Slide 1/18 IBM Research Lilliput meets Brobdingnagian: Data Center Systems Management through Mobile Devices Jan Rellermeyer, Thomas Osiecki, Michael Kistler,

Slide 6/18IBM Research

Case Study: IBM Research’s IBM Remote Project

Always Connected

Instantaneous

Focused

SimpleUser Interface

Communication

visualization of complex datarelevance firstdrill-down UI

direct connection to the managed machinesrefresh rate vs. power consumption

IBM Blade Centers

Page 7: Slide 1/18 IBM Research Lilliput meets Brobdingnagian: Data Center Systems Management through Mobile Devices Jan Rellermeyer, Thomas Osiecki, Michael Kistler,

Slide 7/18IBM Research

Case Study: IBM Remote Project

Page 8: Slide 1/18 IBM Research Lilliput meets Brobdingnagian: Data Center Systems Management through Mobile Devices Jan Rellermeyer, Thomas Osiecki, Michael Kistler,

Slide 8/18IBM Research

Research Challenges Due To The Changes

1. Platform being used for doing the systems management: Server to Mobile Devices

I. How do we optimize the scarce resources of the systems management platforms? Primarily, battery and communication bandwidth.

II. How do we handle the fact that the platforms will be insecure and fault-intolerant for parts of their operation?

III. How do we visualize the (hopefully) rare failure event in a deluge of systems monitoring data?

Page 9: Slide 1/18 IBM Research Lilliput meets Brobdingnagian: Data Center Systems Management through Mobile Devices Jan Rellermeyer, Thomas Osiecki, Michael Kistler,

Slide 9/18IBM Research

Research Challenges Due To The Changes

2. Layered systems management to flat hierarchy

I. Can we avoid chaos due to the looser coordination?

II. Can we leverage overlap between interests to cut down on traffic to individual mobile devices?

Page 10: Slide 1/18 IBM Research Lilliput meets Brobdingnagian: Data Center Systems Management through Mobile Devices Jan Rellermeyer, Thomas Osiecki, Michael Kistler,

Slide 10/18IBM Research

Solution Directions for Question 1

I. How do we optimize the scarce resources of the systems management platforms? Primarily, battery and communication bandwidth.

1. Platform being used for doing the systems management: Server to Mobile Devices

• Minimize number of messages, while still receiving enough to reliably detect failures– Use publish-subscribe or other push mechanism, in preference to

pull mechanism– BUT: Most hardware management modules do not support push– Use an intermediate server for aggregation and filtering

• Apply principles of rare event detection – Non-events occur with much higher frequency than events of interest– BUT: Requires model of events: time distribution, correlation, etc.

Page 11: Slide 1/18 IBM Research Lilliput meets Brobdingnagian: Data Center Systems Management through Mobile Devices Jan Rellermeyer, Thomas Osiecki, Michael Kistler,

Slide 11/18IBM Research

Solution Directions for Question 1

II. How do we handle mismatch in dependability characteristics (between target platform and management platform)?

– Mobile device can be physically compromised and OS-level protection can be bypassed

– Mobile devices are often employee owned

1. Platform being used for doing the systems management: Server to Mobile Devices

• Application security and server-side security need to be built in– Periodic authentications, not one-time authentications– Biometric-based authentication

Page 12: Slide 1/18 IBM Research Lilliput meets Brobdingnagian: Data Center Systems Management through Mobile Devices Jan Rellermeyer, Thomas Osiecki, Michael Kistler,

Slide 12/18IBM Research

Solution Directions for Question 1

III. How do we visualize the needle in the haystack?– Needle: Outages, failures, or behavior that is indicative of an

imminent failure– Haystack: Deluge of monitored data about target platforms– Screen real estate is limited

1. Platform being used for doing the systems management: Server to Mobile Devices

• First off, deliver only a small superset of relevant messages– Push notification, such as, through Google Cloud Messaging (GCM)

• Drill-down views, starting with summary alert view for all machines in data center– Followed up with root cause analysis techniques that run on servers

Page 13: Slide 1/18 IBM Research Lilliput meets Brobdingnagian: Data Center Systems Management through Mobile Devices Jan Rellermeyer, Thomas Osiecki, Michael Kistler,

Slide 13/18IBM Research

Solution Directions for Question 2

I. Tight vertical integration of different software layers implies different domain experts will be concurrently involved in problem troubleshooting

1. Layered systems management to flat hierarchy, OR Crowdsourcing systems management

• Relevant features of social media will be used– Example: At IBM, you can “friend” specific Blade Centers and have

“circles” of administrators

• Role-based Access Control (RBAC) can be used for security control of different software layers– Fine-grained roles can be assigned– RBAC solutions exist for sophisticated management of these roles,

such as, hierarchies, overlaps, and trasience

Page 14: Slide 1/18 IBM Research Lilliput meets Brobdingnagian: Data Center Systems Management through Mobile Devices Jan Rellermeyer, Thomas Osiecki, Michael Kistler,

Slide 14/18IBM Research

Solution Directions for Question 2

I. Overlap between interests of multiple mobile devices and their geographical proximity

1. Layered systems management to flat hierarchy, OR Crowdsourcing systems management

• Commonalities of interest can be used to cut down on cellular bandwidth usage– Commonalities can exist due to proximal geographic location or

overlap among system administration responsibilities – Distribute information to a subset of mobile devices and then use

local communication (Bluetooth, Wi-Fi) to disseminate information among proximal devices

Page 15: Slide 1/18 IBM Research Lilliput meets Brobdingnagian: Data Center Systems Management through Mobile Devices Jan Rellermeyer, Thomas Osiecki, Michael Kistler,

Slide 15/18IBM Research

Case Study: IBM Remote

• Health view (left) broken into critical, non-critical, and system-level health messages

• Event log view (right) is filtered to show only warnings and errors

Page 16: Slide 1/18 IBM Research Lilliput meets Brobdingnagian: Data Center Systems Management through Mobile Devices Jan Rellermeyer, Thomas Osiecki, Michael Kistler,

Slide 16/18IBM Research

Related Work

• Much work on managing mobile devices – opposite direction than what we are discussing in this paper– Some work on mobile agents for managing servers [18 –

NOMS02, 19 – Software07]– Sophistication lies in designing a dynamic set of agents whose

monitoring policies can be changed on the fly

• Some commercial prototypes for monitoring and control of target end points from mobile devices– UCSand for Android devices [21] for Cisco Unified Systems

monitoring and control – PCMonitor [22] from MMSoft Design Ltd. – VMWare vCenter Mobile Access [23] is a virtual appliance on the

server side for managing a datacenter from mobile devices– Recent offering from HP [18]

Page 17: Slide 1/18 IBM Research Lilliput meets Brobdingnagian: Data Center Systems Management through Mobile Devices Jan Rellermeyer, Thomas Osiecki, Michael Kistler,

Slide 17/18IBM Research

Take-away Lessons

• A changed vision of systems management is happening – mobile clients being used to manage large masses of physical and virtual servers

• This throws open some technical challenges

1. Management to be done through resource-constrained mobile devices which have lower dependability than target devices

2. Crowd-sourcing of systems management, rather than linear flow of control through hierarchies of sysadmins

• These challenges are being addressed in multiple projects at commercial organizations, including in the IBM Remote project at IBM Research

Page 18: Slide 1/18 IBM Research Lilliput meets Brobdingnagian: Data Center Systems Management through Mobile Devices Jan Rellermeyer, Thomas Osiecki, Michael Kistler,

Slide 18/18IBM Research

Presentation available at:Dependable Computing Systems Lab (DCSL)

web siteengineering.purdue.edu/dcsl