Upload
others
View
11
Download
0
Embed Size (px)
Citation preview
DASM Self-Service Framework
Technical White Paper May 2009
www.datasynapse.com p. 2
Confidentiality Notice and Disclaimer
Copyright © 2009 DataSynapse, Inc. All Rights Reserved.
Neither this document nor any of its contents may be used or disclosed without the express written consent of DataSynapse. This document does
not carry any right of publication or disclosure to any other party. While the information provided herein is believed to be accurate and reliable,
DataSynapse makes no representations or warranties, express or implied, as to the accuracy or completeness of such information.
Only those representations and warranties contained in a definitive license agreement shall have any legal effect. In furnishing this document,
DataSynapse reserves the right to amend or replace it at any time and undertakes no obligation to provide the recipient with access to any
additional information. Nothing contained within this document is or should be relied upon as a promise or representation as to the future.
GridServer® is a registered trademark. DataSynapse, the DataSynapse logo, GridClient, GridBroker, DASM, FabricBroker, LiveCluster, VersaUtility,
VersaVision, SpeedLink and RTI Design are trademarks. GRIDesign is a registered servicemark of DataSynapse, Inc. All other product names are
trademarks or registered trademarks of their respective companies.
DataSynapse products are protected by U.S. Patent No. 6,757,730, U.S. Patent No. 7,093,004 and U.S. Patent No. 7,130,891; other patents pending.
www.datasynapse.com p. 3
Table of Contents
Introduction ---------------------------------------------------------------------------------------------------------------- 4
Key Insights ----------------------------------------------------------------------------------------------------------------- 4
Comparative Analysis ---------------------------------------------------------------------------------------------------- 7
Benefits -------------------------------------------------------------------------------------------------------------------- 10
Roles and Responsibilities -------------------------------------------------------------------------------------------- 15
Operating Model -------------------------------------------------------------------------------------------------------- 17
Service Levels ------------------------------------------------------------------------------------------------------------ 20
www.datasynapse.com p. 4
Introduction
DASM Self-Service Framework (SSF) provides an elastic IT operating environment to create and manage
application infrastructure on system resources at the click of a button. Unlike existing cloud and
virtualization management solutions, SSF allows for rapid re-purposing of multiple physical or virtual
machines with complex multi-component application infrastructure configurations.
This approach is unique in that the multi-component application infrastructure configurations are managed
independent of the physical or virtual machines in which they reside. This paper addresses the model, use-
case, roles, responsibilities, and underlying technology that underpin this framework.
FFeeaattuurree PPrroodduuccttiioonn EEnnvviirroonnmmeennttss NNoonn--PPrroodduuccttiioonn EEnnvviirroonnmmeennttss
Need for repurposing systems for different purposes
Low to Medium Very High
Change Controls Very High Low in Development; Medium in Quality Assurance; High in Pre-Production Staging
Scalability Requirements High Low in general; High in bench mark testing
Availability Requirements Very High Low to Medium
It is important to note that because of the inherent differences between Production and Non-Production
environments, various features of SSF provide different levels of benefits within Production and Non-
Production environments. Nevertheless, in this whitepaper, we will discuss SSF as it applies across the entire
spectrum of a typical IT datacenter.
Before we dive into the details of how SSF works, we will focus on key insights required to understand SSF.
These insights will prepare us to understand a comparative analysis of SSF compared to other solutions
available in the application infrastructure landscape and also provide the groundwork for understanding the
benefits of implementing SSF.
Following the discussion on benefits, we will discuss the SSF system architecture, roles and responsibilities,
the operational model, impact on service levels, and the capacity planning model required to realize the SSF
system architecture in practice.
Key Insights
The key insights required to understand DASM SSF are as follows:
DASM SSF provides 1-click capability to create complex distributed application infrastructure
configuration images from templates and run them within the SSF environment
SSF images and templates do not contain any OS or application platform binary distributions within
them and both the templates and the images are typically only a few megabytes in size
SSF image is run by transiently inserting the image into any relevant operating system (OS) image
set: the complete runtime stack comprising of OS layer, application platform layer and application
components is automatically assembled at the time of image startup, and is disassembled at the
time of image shutdown
www.datasynapse.com p. 5
If changes are made to a running SSF image, these changes are automatically captured within the
SSF image at the time of image shutdown
The interconnections among distributed SSF image components are dynamically established at the
time of image startup
Clustered components within a running SSF image are capable of dynamic clustering in response to
varying load experienced by the applications hosted within the cluster
Multiple, time-stamped versions of an SSF image can be saved in an image repository, and any
saved version can be restored from the repository and run within any SSF environment
SSF images can be easily copied from one environment to another environment
Data associated with some SS image components, such as database servers or LDAP directory
servers, may be optionally mapped to non-transient shared SAN or NAS storage, even as the
runtime processes are always inserted into a transient OS image
Let us try to visualize the insights we have noted above in the context of an example.
Example WebSphere Cell SSF Image
Let us discuss the key insights in the context of the example WebSphere Cell SSF image, as shown in Figure 1.
This example SSF image defines following application infrastructure configuration:
An IBM HTTP Web Server, with a WebSphere Plug-in capable of routing traffic to WebSphere cluster
members
An IBM WebSphere Deployment Manager controlling the WebSphere cell, including the WebSphere
cluster
A WebSphere Cluster with two application server nodes, each node running a cluster member,
whereby each node is part of the WebSphere cell defined by the Deployment Manager
Cluster has dynamic clustering capability and new nodes and cluster members can be added on the
fly if the applications hosted within the cluster require horizontal scaling. Information about the
cluster members dynamically added or removed from the cluster is dynamically propagated to the
IBM HTTP web server plug-in
An Oracle 10g database that is connected with non-transient data storage in a SAN
A Sun Directory server that is connected with non-transient data storage in a SAN
This SSF image is transiently inserted into the OS image set shown in Figure 1.
However, this SSF image can be lifted from the current OS image set and inserted into an
entirely different compatible OS image set with a completely different network profile.
www.datasynapse.com p. 6
Figure 1 Self-Service Image
www.datasynapse.com p. 7
Comparative Analysis
BladeLogic and Tivoli Comparative Analysis
In non-virtualized infrastructure environments, the capability to create and run complex, distributed
application infrastructure configurations from templates is available in the form of a persistent install of
distributed components into specific OS images, using solutions such as offered by BladeLogic, or Tivoli. This
approach can be characterized as fast provisioning approach whereby a complex distributed configuration
can be created fairly quickly from standardized configurations.
Under this approach, once the distributed application infrastructure configuration is installed into an OS
image set, it is locked into that OS image set. Any changes made to this working configuration can not be
copied out of the underlying OS image set. If you want to create a copy of a working application
infrastructure configuration, you have start all over again. If you have to move the configuration to a new OS
image set, you have to start all over again.
In non-volatile, stable, production environments, this approach works reasonably well, but it is sub-optimal
for non-production environments, where change frequency is very high. This approach is also not the best
approach for production environments characterized by volatile application loads, because this approach
does not lend itself to dynamic clustering.
In summary, this approach offers limited flexibility in terms of copying and migration of
configurations within application development life-cycle.
VMware Lab Manager Comparative Analysis
Readers may spot many common concepts between SSF and what is offered by virtual infrastructure
management tools, such as VMware Lab Manager. Therefore, following questions need to be discussed:
Why bother creating SSF images that are disconnected from OS images?
Why not store the distributed application infrastructure configuration within virtual machines created
from templates? In fact, VMware Lab Manager offers exactly such capability.
The two questions raised above are central to understanding the motivation behind the unique approach
offered by DASM SSF and we offer answers to these questions below.
We first start by examining how VMware Lab Manager works. We have selected VMware Lab Manager
because it is the most mature product in its genre. Other products in this area are still being developed by
companies such as Microsoft and Citrix. The important point to note is that regardless of the vendor,
VMware Lab Manager is representative of an approach whereby application infrastructure configuration is
stored locked in with the OS image set.
VMware Lab Manager can quickly create distributed application infrastructure topologies from
configurations stored in an image library. Each configuration in the image library comprises of one or more
machine templates, whereby each template contains a guest operating system and relevant application
components. Before a configuration can be used to create a working topology, a copy of a configuration is
checked out from the image library and is deployed to create the working topology. To optimize storage,
each change to the working copy is stored as a delta disk change from the base configuration image.
The VMware Lab Manager approach works reasonably well, at least for some specific uses cases in labs
(hence the name of the product), but consider following limitations with this approach:
1. How do you take an existing working copy of a configuration within VMware Lab Manager and
move it to a completely different environment that is outside VMware infrastructure? That is not
www.datasynapse.com p. 8
possible, and root cause of this issue is that the OS image set and the application infrastructure
configuration are locked together. This creates problems during life-cycle stage management. By
way of contrast, SSF approach has no such limitation.
2. Imagine a situation where you need to perform bench mark testing on the configuration shown in
Figure 1 and you want to be able to dynamically scale the cluster during your bench mark test to
find out the appropriate minimum number of cluster members needed to meet baseline load on the
application? How do you do that under VMware Lab Manager? You will need to manually clone and
configure a new virtual machine into the working copy of the configuration. By way of contrast, SSF
approach offers dynamic clustering as an integral feature.
3. Imagine a situation where there are 5 configurations like the one shown in Figure 1, but they only
differ in minor aspects of some security settings and performance settings within the Deployment
Manager. Within VMware Lab Manager, you will need to create 5 different images within the
configuration library, and each of the five configuration images will essentially be the same except
for some minor differences within the Deployment Manager configuration. This will result in a
proliferation of configurations within the image library, and the root cause of this proliferation can
be directly attributed to the fact that VMware Lab Manager does not separate application
components from the OS images.
4. Now imagine a situation where there are hundreds of working copies of various configurations in
use within the Lab Manager environment and you have to apply a patch to an application platform
binary distribution? This means now you have to use some facility to apply patches to possibly
hundreds of virtual machines. By way of contrast, SSF approach requires patching a single
application platform binary distribution.
Before we leave this specific discussion, it is important to note that we are not minimizing
numerous benefits offered by VMware virtual infrastructure. In fact, SSF is completely
integrated with VMware virtual infrastructure and leverages many virtualization benefits
offered by VMware in creating and managing the OS image set.
Summary of Comparative Analysis
DASM SSF is unique in its approach of creating a complex, distributed application infrastructure configuration
image from a template and running the SSF image by transiently inserting the image into any relevant OS
image set.
On the following page we offer a summary of the comparative analysis.
www.datasynapse.com p. 9
Feature DASM
SSF VMware
Lab Manager BladeLogic,
Tivoli
1-Click creation of complex, distributed, application infrastructure configurations from templates
Yes Yes Yes, but with less automation, compared to other two approaches
Ability to run application infrastructure configuration image within any relevant virtual or physical infrastructure
Yes No No
Ability to copy an image from one environment to another environment
Yes Yes, but all environments have to be based on VMware virtual infrastructure
No
Ability to save and restore multiple versions of a working image
Yes Yes No
Ability to convert a working image into a template
Yes Yes No
Ability to patch all images dependent on an application platform by patching a single application platform binary distribution
Yes No No
Ability to discard physical or virtual machines, while retaining a working image
Yes Yes No
www.datasynapse.com p. 10
Benefits
This brings us to one of the most important topics within this book: What are the benefits of DASM SSF? In
this section, we will discuss all the major DASM SSF benefits.
DASM SSF benefits are impacted by a number of business and IT drivers: We will discuss such drivers and
note their impact on SSF benefits. As we noted at the outset of this chapter, a given benefit may not be
uniformly applicable across Production and Non-Production environments, so we will note the relative
importance of a given benefit within Production and Non-Production environments.
DASM benefits can be analyzed from an absolute standpoint in the context of a standalone SSF solution, or
can be analyzed from a comparative standpoint in the context of other solutions: In this section, we will do
both. We present the discussion on benefits in the table shown below.
Benefit Business and IT Drivers Relative Importance Analysis
Agility in creating complex, distributed application infrastructure configurations
Key business drivers that directly impact this benefit are: 1. Rate of growth 2. Competitive landscape Key IT drivers that directly benefits this benefit are: 1. Number of IT applications 2. Rate of change in applications 3. Level of innovation
Non-Production: High. Non-production environments are characterized by a high level of stand-up and tear down of complex configurations, and multiple copies of same configurations Production: Medium. Production environments are relatively stable in terms of stand-up and tear-down of complex configurations. The exception to this general rule is that at the time of roll-out of new applications, the level of flux even in the production environment can be fairly high.
The faster the rate of growth, the more intense the competitive landscape, the more the number of applications, the higher the rate of change and level of innovation, the more the level of this benefit. To quantify this benefit, two financial models are relevant: 1. NPV of accelerated earnings achieved through faster time to market 2. Opportunity cost of delays in time to market in the face competition
www.datasynapse.com p. 11
Lower
infrastructure
related capital
and operational
expenditure
Key business drivers that
directly impact this
benefit are:
1. Level of infrastructure
sharing accepted by
business practices
2. Pattern of IT resource
consumption by
business over a 24-hour
period
3. Regulatory constraints
on sharing infrastructure
4. Internal managerial
cost accounting
practices
Key IT drivers that
directly impact this
benefit are:
1. Current level of
infrastructure spending
2. Current level of server
consolidation
3. Current level of
infrastructure
virtualization
4. Ability to consolidate
disparate application
platforms
Non-Production: High.
The need for
repurposing
infrastructure is very
high, which creates
many opportunities for
re-use of infrastructure
resources for multiple
objectives
Production: Medium. A
certain base level
consumption of
resources is always
needed, and
opportunities for
repurposing and reusing
infrastructure resources
in the context of
multiple applications
depend on the 24-hour
pattern of resource
consumption. Shared
contingency resources
for production
environments and
sharing use of DR
environments with non-
production use offer
additional opportunities
for deriving this benefit
in production
environments.
The more business is
willing to share IT
infrastructure resources,
especially for non-
production
environments, higher
the level of this benefit.
The level of this benefit
is incrementally lower
when extensive server
consolidation has
already been achieved
through conventional
physical server
consolidation, or
through use of
virtualized infrastructure
resources. To quantify
this benefit, three
financial modeling
approaches are possible.
1. In the absence of
extensive server
consolidation already in
place, this benefit can be
modeled as a
conventional server
consolidation exercise.
2. If extensive server
consolidation is already
in place, we can model
this benefit in the
context of selected
application projects that
are willing to share
resources. To do so, we
identify infrastructure
www.datasynapse.com p. 12
Horizontal scaling to satisfy volatile demand
Key business drivers that directly impact this benefit are: 1. Nature of load on business applications 2. Impact of volatile demand on business Key IT drivers that impact this benefit are: 1. Can horizontal scaling of application platforms satisfy volatile demand? 2. Is overflow capacity available for horizontal scaling?
Production: High, if horizontal scaling of application platforms can satisfy volatile demand Non-Production: Low, except in benchmark testing.
If applications are architected such that application service levels can be satisfied though horizontal scaling of application platforms, then the level of this benefit is very high
Flexibility in the use of physical and virtual infrastructure resources, OS platforms and application platforms
Key business drivers that directly impact this benefit are: 1. Need for innovation Key technology drivers that directly impact this benefit are: 1. Ability to absorb IT innovation
Production: Low Non-Production: High
Conventional solutions such as BladeLogic offer limited flexibility in being able to change infrastructure resources, change OS platforms and change application platforms. Innovation requires change, and this lack of flexibility can have huge opportunity costs.
www.datasynapse.com p. 13
System Architecture
The system architecture of SSF is comprised of the following important components:
DASM Broker console, which acts as the centralized control center for the framework
DASM Reporting Database, which stores important data collected by DASM
Self-Service Portal, which provides the web-based front-end for SSF users
Self-Service Repository, which stores all the SSF images created and used in the self-service
framework. This is the master repository for all SSF images
Physical and or Virtual Servers – DASM and the Self-Server Framework can operate on legacy
hardware
For Virtual Deployments:
a. VMware vCenter Server, which is used by DASM Broker to manage virtual Resource Pools
and ESX server clusters. DASM Broker uses the vCenter Server to automatically create,
power on, power off, shutdown, and destroy any virtual machines it uses to run various
SSF image components.
b. The virtual machines run in VMware virtual infrastructure Resource Pools, defined within
ESX server clusters
c. A standby pool of ESX servers, which is automatically drained and refilled by DASM Broker
through the vCenter Server, as the demand for ESX server hosts fluctuates within the
VMware Resource Pools managed by DASM Broker through vCenter Server
For Non-Virtual Deployments
a. A pool of physical machines running DASM Engines. These could be legacy x86 machines
that are not properly configured for virtual environments or non-x86 machine types
The system architecture for SSF is shown in detail in Figure 2 below.
www.datasynapse.com p. 14
www.datasynapse.com p. 15
Roles and Responsibilities
Functional roles envisaged for SSF operational model are described below:
System Administrator
This role is responsible for installing and maintaining DASM Brokers and Engines within physical or
virtual infrastructure. This role is responsible for installation and maintenance of operating systems on
bare metal, or as guest operating systems in virtual machines, and creation and maintenance of VMware
templates containing DASM Engine software
This role is responsible for integrating DASM Broker with VMware vCenter Server and, if necessary,
existing workflow systems and datacenter reporting tools.
This role is responsible for security configuration of DASM Brokers. This includes authentication
configuration based on internal database or external LDAP directory, and optional SSL configuration of
browser communications with DASM Broker administrator console, programmatic client communication
with Broker web services, communication between DASM Brokers and Engines, and communication
between DASM Primary and Secondary brokers
This role is responsible for ensuring availability of DASM Broker and Engine instances and hosts and will
need to use appropriate monitoring tools and alerting mechanisms to ensure availability
The skill sets required for this role are general system administration skills for relevant operating
systems, and ability to work with web based administrative console
This role needs to be trained in the installation, maintenance, system configuration of DASM Broker. This
role does not need to be trained in any operational and functional aspects of DASM broker
The responsibilities for this role are unchanged across development, UAT, pre-production and
production environments
IT Architect and Developer
This role defines the application templates offered under SSF services. This role has following responsibilities:
Interact with relevant application development teams and discover requirements for defining a new
application template
Specify the application infrastructure configuration using application template meta-data
Specify the customizable attributes of the application template that require user input
Specify the process workflow for orchestrating the startup and shutdown of the SSF image created from
a product
Build from scratch, customize, or procure (from DataSynapse) following DASM software packages:
- DASM Distributions required by the application template component
- DASM Containers required by the application template component
- Application templates required to build distributed components of the application infrastructure
configuration
- Configure default orchestration rules for startup and shutdown procedures of the SSF image
defined by the application templates.
www.datasynapse.com p. 16
- The skill sets required for this role are a combination of architect and developer skills sets that
include Java, XML, Eclipse IDE toolset, and understanding of relevant application infrastructure
platforms for which distributions, containers, and domains are needed.
- This skill set requires a combination of architectural skills and hands on capabilities. If needed, this
role may be split into two separate roles: IT Architect and Application Developer.
- It is possible to outsource this role to DataSynapse Consulting Services.
- The responsibilities for this role are unchanged across development, UAT, pre-production and
production environments.
Framework Administrator
- A Framework Administrator is responsible for deploying all software artifacts to a DASM broker
- A Framework Administrator is responsible for defining and administering all DASM Broker policies.
This includes adding removal of various DASM domains from various policies
- A Framework Administrator is responsible for scheduling DASM Broker policies or manually
activating and deactivating them, as needed
- A Framework Administrator is responsible for uploading and assigning permissions to application
templates within Self-Service Portal
- A Framework Administrator is responsible for all administrative activities associated with Self-Service
Portal
- The skill sets required for this role are a combination of DASM Broker administration skills and
general control management skills. This role requires an understanding of available infrastructure
and an understanding of resource needs of relevant applications
- The responsibilities for this role are unchanged in development, UAT, preproduction and production
environments. Of course, the relevant policies defined across various environments would reflect the
need of the associated environments
Framework User
- A Framework User is any personnel that is authorized to create and manage SSF images using the SSF
web-based User Console
- No special skill set or training is required for this role
www.datasynapse.com p. 17
Operating Model
First we will enumerate the key elements of the operating model and then elaborate on each element. The
key elements of the operating model are as follows:
New application templates are defined through an interaction between SSF team and application
development teams, and offered within the Self-Service Portal as application templates
Each application template can be used to create an SSF image, which may contain one or more
distributed components
After a new application template is ready for deployment within SSF, a capacity impact is undertaken to
estimate additional capacity required to bring a new application into SSF. Details around capacity
planning model are beyond the scope of this document.
Application templates offered within SSF may be configured for instant provisioning , or for
administrative workflow processing:
Framework Users submit requests for various application templates through the Self- Service Portal User
Console
Request for application images with automated workflow processing are processed automatically and
Framework Users get access to their requested SSF images in a matter of minutes
Application image requests requiring approval are processed by Framework Administrators and once
processing is complete, Framework Users get access to their requested images
To deploy applications, Framework users directly interact with web-based administrative consoles or OS
consoles of relevant distributed components
Application images may be bound by to end date, and the application image resources can be released
after the end date. The Framework User can extend the life span of a application image by requesting a
new end date
www.datasynapse.com p. 18
New Application Template Definition Process
The key steps in the new application templates definition process are shown in Figure 3
New Application Template Definition
Gather application infrastructure requirements from application
development teams
Specify application infrastructure, including application infrastructure
topology components and user inputs for customizing topology
Validate application infrastructure specification with application
development teams
Build templates usingDataSynapse Studio
Test application infrastructure template in DataSynapse Studio
Template tests passed
Template spec is valid
NO
Build FabricServer Distributions and Containers required using
DataSynapse Studio, or procure them from DataSynapse. Customize
as needed
YES
Test Distributions and Containers in FabricServer Broker
Distribution and Container Tests passed
NO
YES
Deploy application infrastructure templates in
Self Service Portal, specifying either
automated or non-automated process
workflow
YES
NO
Application Template Complete
www.datasynapse.com p. 19
User Image Request Process The key steps in the user image request process are shown in Figure 4
User Request
User logs on to SSF portal, selects and offered SSF template, slects a target SSF runtime environment
and creates a new SSF image
Do the requested SSF template and target environment allow
automated processing
User can access SSF image components and deploy applications within relevant
components. User can startup, shutdown, redeploy, capture, save captured versions
and restore saved versions of the SSF image
SSF Admin receives an email notification about user request
Request approved by the SSF Admin
SSF image is created and deployed within target environment and the
requesting user is informed through email
User is informed through email
about the rejected request
User Request Complete
YES
NO
YES
NO
www.datasynapse.com p. 20
Capture, Save and Restore Application Images
Once a user is informed that a new application image has been created and deployed, the user can startup
the image. Once the application image is running within the SSF environment, user can access web-based
consoles or OS consoles of relevant image components and deploy their business logic.
Once the configuration of the applications components is complete, the application image can be captured
and optionally saved into the SSF repository. Any saved version of an image within SSF repository can be
restored to be the current version within SSF runtime environment.
Service Levels
Many Service Level attributes are favorably impacted through SSF and no Service Level attribute is negatively
impacted though SSF. Below, we will discuss important service level attributes that are significantly impacted
by SSF operating model.
Mean Recovery Time for SSF Image
SSF offers no explicit service level objectives around mean recovery time for an image, but it is important to
note that barring capacity constraints, the mean recovery time is expected to be consistent or better than
current times. This is because complete recovery of an application image is the same as a shutdown and
restart of an image, the mean time for which is generally well known within IT.
It is important not to confuse this mean recovery time with high availability capabilities at the physical
infrastructure level, such as VMware HA Cluster, or VMware VMotion capability. Any capability that prevents
hardware failure is orthogonal to this Mean Recovery Time: This mean recovery time starts after hardware,
or software, related failure has actually transpired and thus recovery is needed.
Mean Planned Down Time
SSF offers no explicit service level related to mean planned down time. However, it is important to note that
SSF favorably impacts many planned down time related activities, such as maintenance and upgrade of
application infrastructure topology and therefore is expected to impact the mean planned down time very
favorably.
Key Performance Indicators Thresholds
One of the key service level objectives offered by SSF applications is KPI measurements for selected
application infrastructure components and specification of threshold activation rules. These rules can be
used with all SSF application images, whether or not the components are clustered.
When threshold activation rules are used with clustered components, such rules offer the capability for
dynamic clustering that delivers automatic stabilization of KPIs around an optimal level. Such KPI driven
dynamic clustering may directly impact the responsiveness of applications hosted within the application
infrastructure. The details of this service level are always application specific.
www.datasynapse.com p. 21
Capacity Planning Model
At the end of the day, SSF needs adequate capacity to deliver its offered application infrastructure. This
requires a capacity planning model to help SSF deliver its application infrastructure in a manner that is
economically efficient, yet delivers adequate service levels to achieve business objectives.
Capacity is measured in CPU, memory, shared storage, and network bandwidth. The essential questions for
any capacity model are as follows:
How much capacity do I need at this instant, let us say time t0, to run applications in scope at a level of
throughput and response time that meets business objectives at this instant?
How much capacity do I need at some time t1 > t0 in the future to run applications in scope at a level of
throughput and response time that will meet business objectives at time t1?
One conventional approach to do capacity planning is to measure historical trends and project them into the
future. This works well over short periods of time, say weeks or months, but works horribly over longer
periods, say over years, because all trends last for a period of time, and then change unpredictably.
The approach of the SSF capacity planning model can be summarized as evidence based incremental capacity
expansion. The exact details of this model are as follows:
In SSF capacity model, the main objective is to not add capacity early, but to add it just in time and only
when the SSF provides objective evidence that more capacity is needed
SSF capacity model assumes an environment that includes VMware virtual infrastructure, other virtual
and physical infrastructure, but in particular relies on a Standby pool of VMware ESX servers to
implement an elastic capacity planning model
SSF capacity model assumes a very quick procurement process. Ideally, one should be able to procure a
new VMware ESX server resource within 5 – 7 business days
SSF capacity model recommends starting out with a small Standby pool and growing capacity in the
smallest economically feasible chunks. For example, a recommended chunk for adding CPU and memory
capacity is 4 Dual Core CPU ESX Server with 8 GB memory. Recommended shared storage increment is 1
Terabyte
In SSF capacity model, initial capacity of the Standby VMware ESX server pool is recommended to be
four machines, whereby each machine is a 4 Dual Core CPU machine with 8 GB memory. Initial storage is
recommended to be 1 TB
Of these 4 machines, 2 should be online and 2 machines should be in the Standby ESX server pool (See
Figure 2)
Each SSF product should track relevant KPIs, and have thresholds associated with these KPIs, even when
no dynamic clustering is needed. Violation of KPI thresholds are captured in DASM reporting database
and offer objective evidence of actual application throughput and response times
SSF system architecture will automatically drain and fill the Standby pool as needed. If the standby pool
is drained down to 1 or less machine at least 30 percent of the time over a week, it is time to procure 1
more machine
If the Standby pool is drained down to 0 machines for at least 50 percent of the time over a week, it is
time to procure 1 more machine and add it to the Standby pool
If the Standby pool has more than 2 machines at least 50 percent of time over a week, it is time to
remove 1 machine from the Standby pool, as long there is a minimum of 2 machines left in the Standby
pool
www.datasynapse.com p. 22
If any topology component shows KPI threshold violations that are not automatically satisfied, consider
adjusting the cluster size for clustered topology components and then analyze its impact on the Standby
pool