Approaching continuous availability in WebSphere Process ... · This article provides background, insights, and a pragmatic set of techniques for installing application updates and

This article provides background, insights, and a pragmatic set of techniques for installing application updates andproduct fix packs in WebSphere® Process Server V7.0 environments where continuous availability is desired. Thiscontent is part of the IBM Business Process Management Journal.

Jacob Stoeffler is a student at the University of Wisconsin-Madison, studying computer science. He spent part of 2012 as an intern at IBM inRochester, MN working in the BPM and ODM CTO office.

Eric Herness is an IBM Distinguished Engineer and is the Chief Architect for business process management (BPM) in IBM Software Group.Eric is also the CTO for the business unit focused on BPM and operational decision management (ODM), where he leads the architects whodefine product and technical direction for the business.Eric has worked with many large customers as they have adopted BPM and ODM approaches. He has had key lead architectural roles inWebSphere for more than 15 years. Eric has an MBA from the Carlson School at the University of Minnesota.

22 May 2013Also available in Chinese

IntroductionBusiness process management applications continue to become more and more mission critical asorganizations gain experience with how to best leverage processes to help run and improve theirbusiness. This means that availability requirements for these applications often approach 24/7. BPM alsoencourages continuous process improvement, which means changes to applications are coming morerapidly and in a variety of forms. Mechanisms and techniques to minimize downtime and approachcontinuous availability are needed. These techniques need to be as automated as possible, allowing forconsistency, speed, and reduced chance for a finger check or human error.

This article provides background, insights and a pragmatic set of techniques for approaching continuousavailability in WebSphere Process Server V7.0.

First, topology for WebSphere Process Server is reviewed. Special considerations that prepare and setthe stage for applying the approach suggested by this article are highlighted and detailed. Thenapplication background is presented. Because there are many different kinds of artifacts involved in anIBM BPM solution running on WebSphere Process Server, different techniques are applied to achieve thedesired results. There are some constraints and special settings required at the solution and applicationlevel as well. A detailed explanation is provided for each scenario, including the steps involved inexecuting these scenarios. This mainline description of the scenarios executed and the basic techniquesapplied are used as a baseline for exploring variations related to store-and-forward and alternativetopology configurations. Additional insights into the scenarios executed and some implications are alsoenumerated. Finally, a summary of how to apply a similar approach to product fix packs is described.

Before attempting any of the procedures outlined here in a production environment, we highlyrecommend that you first test them thoroughly. In addition to the topology, applications must be created ina manner that enables the pursuit of continuous availability. This article provides some specificinformation, but assumes that the application handles interface changes, required tables in a database,and other resource requirements appropriately. Another way to describe this is to recognize that someapplication changes may break compatibility and thus require downtime. The focus of this article is on

developerWorksdeveloperWorks Technical topicsTechnical topics Business process managementBusiness process management Technical libraryTechnical library

Maintaining availability during application updates and fix pack installations

Approaching continuous availability in WebSphere Process Server V7 http://www.ibm.com/developerworks/bpm/bpmjournal/1305_herness/...

1 of 17 19/12/2013 20:29

compatible changes. For example, operations are added, not removed or changed. Application-specificdatabases are not changed. Removing operations and changing databases are the kinds of changes thatare outside the scope of this article.

Background of Process Server topologyFigure 1 shows what a typical WebSphere Process Server configuration may look like. For the sake ofsimplicity, throughout this article we will assume a single-cell topology similar to this one. We cannotpossibly cover each and every configuration scenario, but we hope to provide a general set of practicesthat can be applied to the vast majority of configurations.

Figure 1. A two-node, four-cluster topology with IBM HTTP Server

Any WebSphere Process Server topology utilizes a set of four key functions: application deploymenttarget, messaging infrastructure, supporting infrastructure, and web infrastructure. A cluster is a set ofservers that performs one or more of those four functions. It is possible to have one cluster perform allfour of the key functions. However, the chances of attaining continuous availability increase if youseparate these functions across multiple clusters to increase resiliency.

In the topology shown in Figure 1, there is a separate cluster dedicated to each of the four functions. Theapplication deployment target cluster (AppTarget) consists of the servers on which your applications areinstalled. These applications may include business processes, services, human tasks, and mediations.The remote messaging cluster (Messaging) provides support for asynchronous messaging for yourapplications and for the needs of internal Process Server components. The support infrastructure cluster(Support) provides function that is complementary to the deployment target, and as a separate cluster itprovides isolation and relieves specific workload from the deployment target. Finally, the web componentcluster (WebApp) hosts web-based client applications such as Business Space, Business ProcessChoreographer tools, and REST API services.

This topology consists of a deployment manager and two Process Server nodes. The role of thedeployment manager is to manage the cell and to provide an interface for configuring the variouscomponents in the cell. The deployment manager is also responsible for installing and updatingapplications, and therefore it is of vital importance when we strive to achieve continuous availability.

A node hosts one or more application servers; in this topology, each node hosts one member of each ofthe four clusters. Each node is remotely managed by the deployment manager via a node agent. Thedeployment manager communicates with a node's node agent, which in turn communicates with thatnode's application servers.

In WebSphere Process Server, applications are installed via the deployment manager using either the


2 of 17 19/12/2013 20:29

Integrated Solutions Console or the wsadmin command-line scripting tool. When an application isinstalled or updated, the new application's artifacts are first stored in the cell's master repository. Thedeployment manager then passes the new application's artifacts to each node's node agent. Then eachnode agent updates the application on the node's application servers.

IBM HTTP Server is commonly used for handling HTTP traffic in IBM BPM environments. IBM HTTPServer allows you to configure traffic routing to application servers using a file named plugin-cfg.xml. Aclient connects to the HTTP server, which routes the client's requests to one of the cell's applicationservers, depending on the load that each server is configured to handle. We will cover IBM HTTP Serverconfiguration in more detail when we discuss updating asynchronous applications.

In this basic overview of WebSphere Process Server topology, we have attempted to review at a highlevel what you will need to know in order to understand and successfully use the procedures coveredlater. For a more in-depth look at Process Server topologies, including installation and configurationsteps, please refer to the IBM Redbook WebSphere Business Process Management V7 ProductionTopologies.

The role of topology in continuous availability scenariosA key to approaching continuous availability in WebSphere Process Server is minimizing downtimeduring application updates. Our goal is to update an application without the clients experiencing anydowntime and preferably without noticing anything at all. For some types of applications this is simple andrequires no special preparation. For most Process Server applications, however, a zero-downtime updaterequires planning and a special procedure that involves routing incoming application traffic to and awayfrom individual nodes. We will cover this procedure in detail later, but first we will outline the ProcessServer topology that makes it possible.

A prerequisite to continuous availability is what we call high availability. To approach continuousavailability, we recommend a topology similar to the one shown in Figure 1: a two-node, four-clustertopology with one member of each cluster on each node. At a minimum, you must have an applicationtarget cluster that is separate from the messaging cluster. This ensures that the messaging engine canfail over independently of the application target servers. You must also have at least two separate nodesthat host members of the application target and messaging clusters. This allows you to stop the serverson one node and update an application while the other node handles all of the application traffic. Theserecommendations provide not only the foundation for approaching continuous availability, but also serveas a solid base for high availability.

Background of Process Server applicationsWebSphere Process Server v7.0 applications are usually organized into modular units that are developedand deployed as Service Component Architecture (SCA) modules. A module can contain a variety ofSCA components, the basic building blocks to encapsulate business logic which are exposed to othercomponents through an interface. Commonly used component types in WebSphere Process Serverinclude Business Process Execution Language (BPEL) processes, mediation flow components (MFCs),and service components.

WebSphere Process Server application modules are typically developed in WebSphere IntegrationDeveloper. Once a module is developed, it is exported as an enterprise archive (EAR) file. The EAR isusually deployed first to a WebSphere Process Server test environment where it is tested both in isolationand with the application as a whole. Once the module has passed thorough testing, it is deployed to aproduction environment. Although the main focus of this article is on this deployment phase, there aresome concepts at the application development level that must be understood in order to successfully


3 of 17 19/12/2013 20:29

approach continuous availability.

SCA invocation stylesSCA components can be invoked either synchronously or asynchronously. Synchronous invocationmeans that the caller is blocked until a response is received from the component being called. Since theservice requester and the service provider run in the same thread, all processing by the requester issuspended until it receives a response from the provider.

Figure 2. Synchronous SCA invocation

Synchronous SCA invocation is useful if the requester is dependent upon receiving a response from theprovider in order to continue processing.

Asynchronous invocation allows the caller to invoke a service without waiting for a response to beproduced right away. The service requester and the service provider run in different threads, so therequester can continue processing while the provider prepares a response.

Figure 3. Figure 3. Asynchronous SCA invocation

Asynchronous invocation is useful if (1) the provider may take a long time to respond (minutes, hours, or


4 of 17 19/12/2013 20:29

days) or (2) the requester has further processing it can perform that does not depend on the informationreturned from the provider. There are three flavors of SCA asynchronous invocation that can be used inWebSphere Process Server applications: one-way invocation, callback invocation, and deferred responseinvocation. For a review of these asynchronous invocation styles and how they apply to WebSphereProcess Server, see the developerWorks article Asynchronous Processing in WebSphere ProcessServer.

Early binding and late binding for BPEL process invocationA special case to consider when discussing invocation is the invocation of BPEL business processes. Aclient (caller) of a BPEL process can be configured to use either early or late binding for invocation. Earlybinding means that the client is hard-wired to a specific version of the process, and it will only invoke thatversion. When updating a process that is invoked with early binding, you must also update the client touse the new process version. With late binding, a client will always invoke the process version that ismost current. The decision about which process version the client invokes is made dynamically atruntime.

Early and late binding can be configured in several ways depending on how the client invokes the BPELprocess. One way that BPEL processes are invoked is via the Business Process Choreographer API. Inthis case, controlling the type of invocation binding is as simple as calling the proper invocation method.In the Business Process Choreographer API, invocation methods generally have two versions withdifferent signatures: one that takes a process template name and one that takes a template ID. Themethods that take a template name as a parameter use late binding while methods that take a templateID use early binding.

BPEL processes can also be invoked via an SCA wire from a calling component to the processcomponent. By default, invocation of a BPEL process from an SCA component is early bound becausethe SCA wire is tied to a specific version of a process component. It is possible to invoke late bound viaSCA by using a "proxy process." Details on this technique can be found in Creating versions of yourprocess to be used with SCA components and exports in the WebSphere Integration DeveloperInformation Center. This is generally not recommended because it creates unnecessary processinstances which can affect performance.

A third case is the invocation of a BPEL process from another BPEL process. This is done by adding aninvoke activity as part of the calling process. To make the invocation early bound, use an SCA wire toconnect the two process components. To use late binding, do not use a static SCA wire; instead, specifythe template name of target process as part of the reference partner properties of the invoke activity. Formore information, see Late binding using a partner link extension in the WebSphere IntegrationDeveloper Information Center.

Updating a BPEL business processThis section applies if you only need to update a BPEL business process within a Process Serverapplication. If changes to other types components within SCA modules are required, please refer to thenext two sections regarding updating SCA modules.

Thanks to BPEL versioning, it is quite simple to update a BPEL process while maintaining continuousavailability of your application. The only requirements are that (1) the caller of the process is configured touse late binding and (2) the old and new versions of the process have matching component names andtarget namespaces. These requirements are considered normal and best practice, so there is nosignificant limitation or inconvenience here. If these requirements are fulfilled, the new process versionwill be seamlessly picked up the moment that it is set to become valid.


5 of 17 19/12/2013 20:29

When updating a BPEL process you also need to consider whether or not already running processinstances should be migrated to the new process version. Although this generally doesn't affectavailability, we would suggest doing instance migration incrementally so as not to overload your system.

If you decide to migrate running instances, follow the guide Create a new version of your process –migrate running instances to create your new process version. Otherwise, follow the steps in Create anew version of your process – running instances use the old version. We recommend setting the"valid-from" time such that the new process version can be installed before it becomes valid.

Once the new process version is created, you can deploy the uniquely named SCA module just as youwould any other module, using either the Integrated Solutions Console or wsadmin scripting. Be sure toinstall the new process module as a new application rather than as an update to an existing application. Ifyou follow these steps, your new process version will be picked up without a hitch the second it becomesvalid. The old process module can be safely removed if you are sure that no other clients are early boundto that process version, there are no running instances, and there are no forgotten old failed events ormessages.

Updating an SCA module in a synchronous Process Server applicationIt is also rather straightforward to update an SCA module within a strictly synchronous application whilemaintaining continuous availability. By "strictly synchronous," we mean that all invocations within theapplication are synchronous. The majority of Process Server applications do not fit into this category. Forapplications that contain any asynchronous invocations, we recommend that you use the proceduredescribed in the next section of this article.

Our tests of synchronous application updates indicate that continuous availability can be achieved duringan in-place update without the use of any complicated procedures or special configurations. We highlyrecommend, however, testing this with your own application in your own test environment beforeattempting it in a production environment.

The steps for this in-place module update are nearly the same as for any other Process Serverapplication update. First, make the modifications to your SCA module as you normally would. You canthen deploy your updated module using either the Integrated Solutions Console or wsadmin scripting.

You may be accustomed to stopping your applications or your servers before applying updates, but youwill not do so here. To deploy the updated module, simply use WebSphere's Update feature, then saveand synchronize the changes with nodes. This installs the updated module to your serverssimultaneously without any downtime. We do not recommend using the Rollout Update feature in thiscase. Rollout Update automatically pauses or stops each application server to apply the updatesequentially, but we do not want to pause or stop any servers in this case because that would result in aperiod of unavailability.

As long as all invocations in your application are synchronous, there is very low risk of failure using thismethod. If you do encounter difficulties while testing this, please refer to the next section for a techniquethat can be safely used with all application types.

Updating an SCA module in a Process Server application that usesasynchronous or mixed invocation stylesApproaching continuous availability becomes slightly more complicated when you need to update anSCA module in an application that uses asynchronous invocation. The procedure we will discuss in thissection pertains to all Process Server applications that contain asynchronous invocations or a mixture of


6 of 17 19/12/2013 20:29

synchronous and asynchronous invocations. This includes applications that use any of three flavors ofasynchronous invocation mentioned earlier in this article.

The reason that approaching continuous availability is more complicated in this case is thatasynchronous invocation requires messaging support. Multiple transactions are inherent in anyasynchronous model. Queues are also part of the picture. These elements of the architecture combine tomake this a more complex scenario than just straight synchronous invocations. During an update, thedestination queues of an asynchronously invoked SCA module may be marked for deletion, and anymessages contained in those queues could be lost. This means that a direct in-place update is risky if theapplication is actively processing work. Therefore, for updates of this type we must be able to control theflow of traffic and update the application on a particular node only after work has ceased. Our approachwill be to perform the application update one node at a time to maintain availability at all times.

First we will give some background on various WebSphere and WebSphere Process Server conceptsthat apply specifically to this update procedure. Make sure that you fully understand these conceptsbefore attempting the procedure. Since it will be to your advantage to automate the procedure to avoidinconsistencies and minimize the time required, we will provide sample scripts along the way. We willthen detail the update procedure itself at the end of this section.

Node synchronizationWe can control when a node will receive a new version of an application by disabling or enabling nodesynchronization. If we disable node synchronization on a node, the node will not be aware of changes inthe master repository of application artifacts, and thus no applications on that node will be updated. Whenwe want an application update to occur, we re-enable node synchronization and when synchronization istriggered the new artifacts are pulled down to the node's servers.

We have found that it is necessary to restart the node agent after disabling or enabling nodesynchronization in order for the change to take effect; thus we also provide sample scripts for disabling,enabling, and restarting a node agent in the Download section of this article.

Routing incoming HTTP trafficInbound traffic to WebSphere Process Server can come in a variety of forms, such as HTTP, JMS, andMQ traffic. In this section, we will discuss HTTP traffic. We need the ability to route inbound HTTP trafficto servers on certain nodes and temporarily prevent traffic from reaching servers on other nodes. We willdemonstrate how to do this using IBM HTTP Server because of its simplicity and widespread use in BPMenvironments.

IBM HTTP Server sprays web requests among application servers according to its HTTP pluginconfiguration file, plugin-cfg.xml. For a detailed explanation of how this works, please refer to thetechnote Understanding IBM HTTP Server plug-in Load Balancing in a clustered environment.

Of importance to us is the LoadBalanceWeight attribute of the <Server> element. By setting a server'sLoadBalanceWeight to 0, the server will no longer receive requests from new sessions. Affinity requestsfrom existing sessions may continue to be routed to the server if session replication is not configured.However, all requests from new sessions will be routed to servers with LoadBalanceWeight greaterthan 0.

One way to achieve this is to have multiple plugin-cfg.xml files readily available on your IBM HTTP Serverand swap them out when necessary. For example, let's assume we have two application servers thatrequests are normally routed to: A.App and B.App. In this case, we would need three XML configurationfiles: one file that allows requests to both servers (both.xml), one file that routes requests only to A.App(a.xml), and one file that routes requests only to B.App (b.xml).


7 of 17 19/12/2013 20:29

In normal operation mode, the IBM HTTP Server plugin-cfg.xml would contain the contents of both.xml.When the time comes to perform an application update on one of our servers, we would simply swap outthe contents of plugin-cfg.xml with one of our other configuration files. For example, if we want allrequests to be routed to A.App, we replace plugin-cfg.xml with a.xml. IBM HTTP Server seamlessly picksup the configuration change and stops routing requests to B.App. Normally there is a delay before IHSdetects configuration changes, but we can use the following command to gracefully reload theconfiguration immediately:IBM/HTTPServer/bin/apachectl -k graceful

(See ihs_route_to_node_a.sh provided for download.)

Routing other trafficAfter inbound HTTP traffic has been rerouted, a cluster member can still receive new work via JMS orMQ traffic. A cluster member cannot shut down until it has finished processing work; thus we also need away to stop new JMS and MQ traffic from flowing in. We can do this by deactivating a cluster member'sJ2CMessageEndpoints. We demonstrate this in the script quiesce_traffic_jms_mq.jacl provided fordownload.

If BPEL processes are installed and running, we also need to stop BPEL scheduler generated traffic. Thiscan be accomplished by stopping the BPEScheduler as demonstrated in the scriptquiesce_traffic_bpel.jacl provided for download.

If you have created any other schedulers, you should also stop them. You can script this as demonstratedwith the BPEScheduler script above.

Stopping and starting servers gracefullyOnce all incoming traffic has stopped, we can safely stop the cluster members on a node. Werecommend stopping a node's cluster members in the following order: application target cluster, webcluster, support cluster, and messaging cluster. This can either be scripted with wsadmin or done throughthe Integrated Solutions Console, but again we recommend that you script as much as possible for higherconsistency. You can use the script stop_cluster_members.jacl provided for download.

After updating an application on a node, we will need to start its cluster members again. We recommendstarting cluster members in the reverse order of stopping them; that is, messaging cluster, supportcluster, web cluster, and application target cluster. After restarting a cluster member, there is no need foryou to reactivate J2CMessageEndpoints or restart the BPEScheduler, as this will all be doneautomatically at server startup. (See start_cluster_members.jacl provided for download.)

Failed eventsDuring our testing of this procedure in WebSphere Process Server V7.0.0.3, we occasionally saw SCAfailed events occur during messaging engine failover. It is our experience, however, that the failed eventscan be resubmitted successfully after the update is complete. The failed event manager is accessible inthe Integrated Solutions Console, but it is also possible to resubmit failed events using wsadmin scripting.(See resubmit_failed_events.jython provided for download.)

Store-and-forwardIf you want to minimize the generation of failed events, you can use the store-and-forward feature that isnew in WebSphere Process Server V7.0. If this feature is used, only one failed event is generated if thereare runtime errors. Once one runtime error occurs, a store is triggered, and all of a service's subsequentrequests are stored in a queue rather than being submitted. You can later forward these requests to theirdestinations using the Store and Forward widget in Business Space. There is currently no public store-and-forward API that supports changing store/forward state, so we do not advocate scripting this step,


8 of 17 19/12/2013 20:29

though we have verified that it is possible to do so. For more details on how to use store-and-forward,please see the developerWorks tutorial Using the store-and-forward feature in WebSphere ProcessServer v7.0.

Application-level settingsWhile testing this procedure in WebSphere Process Server V7.0.0.3, we identified a few application-levelsettings that were most successful for asynchronous applications with certain attributes. Most failures weencountered occurred during failover of the messaging engine between cluster members. For the updateprocedure to be successful, it is important that your application gracefully handles messaging enginefailover. Following the recommendations we give here will be a good start, although, given the variabilityof IBM BPM applications and environments, it is possible that your scenario will require slightly differentsettings. Therefore, we specifically recommend that you test your application while under load duringmessaging engine failover to ensure it is handled properly and important messages are not lost.

First, if your module contains any mediation flow components, ensure that the fail terminals areconnected to a fail error handling component. This ensures that failed events are saved and that failedtransactions are rolled back when necessary. If your module uses store-and-forward, we recommendsetting the store-and-forward qualifier such that all ServiceRuntimeExceptions are caught. There aremany types of runtime exceptions that could occur during the update procedure, and you want to be sureto catch all of them. If your module contains any one-way asynchronous invocations, we recommendsetting the reference's asynchronous invocation qualifier to call rather than commit (call is the defaultsetting). Lastly, we recommend setting the reference's asynchronous reliability qualifier to assured(persistent) for any type of asynchronous invocation. These last two recommendations help to ensurethat no messages are lost in the case of failures during the update procedure.

We highly recommend these settings if your asynchronous module has any of the attributes describedabove. Again, we urge you to test thoroughly because what is necessary for one application andenvironment may be different for another.

Update procedure for SCA modules within asynchronous applicationsNow that we have discussed the important elements of the update procedure individually, we will detailthe procedure itself. Follow these steps to approach continuous availability during module updates forapplications containing asynchronous or mixed-style invocations. After each step is a figure that reflectsthe state of the cell's components after the step is completed.

Before starting the update, the Process Server cell is in its normal operational state.All JVMs are running, including the deployment manager, all cluster members, and all node agents.Automatic node synchronization is enabled for both nodes, as represented by the blue lines betweenthe node agents and the deployment manager. The messaging engine (ME) is active on the Messagingcluster member of Node A. HTTP traffic is routed to all members of the AppTarget cluster, as shown bythe orange dashed lines. Schedulers such as the BPEScheduler are running. Version v1 of appX iscurrently deployed to the application target of both nodes and is also seen in the master repository. Wewill deploy v2 of appX to the application target cluster, one node at a time. The Support and WebAppclusters are not shown here because we are assuming that the application is only installed to theAppTarget cluster. As we move from step to step we will highlight changes in red.

Figure 4. Beginning state

1.


9 of 17 19/12/2013 20:29

Disable node synchronization and restart the node agents for all nodes on which the application isinstalled.We disable node synchronization so that the nodes do not immediately receive the new version of theapplication when it is updated using the deployment manager. We will update the application on eachnode only once the node's cluster members have been gracefully shut down.

This is shown in Figure 5 by the absence of the blue lines connecting the node agents and thedeployment manager.

Figure 5. Disable node synchronization for all nodes

2.

Update the application module(s) using the deployment manager. Do not synchronize changeswith nodes.Install the update using wsadmin scripting or the Integrated Solutions Console. Use the Update featureto update the existing module rather than installing the update as a new module. Save changes to themaster repository, but do not synchronize changes with the nodes.

As a result of this step, v2 of the module is in the master repository but is not in each node's localconfiguration. Here we demonstrate updating only one module, but multiple modules can be updated atonce as long as the changes are backwards compatible (see step 3d).

Figure 6. Install update to master repository

3.

For each node on which the application is installed, perform the following steps, one node at a time.4.


10 of 17 19/12/2013 20:29

We will illustrate this with Node A only, but these steps would be repeated for Node B to complete theupdate.

Stop all incoming traffic to cluster members on the node. This includes HTTP, JMS, MQ, and BPELtraffic. Unless session replication is configured, if there are still active HTTP sessions tied to clustermembers, you should wait for them to close if they could contain any critical data.We stop traffic to cluster members on the node because we need to stop the cluster membersgracefully in the next step. The cluster members will not shut down until all work has finished; thus,assuming that there is constant incoming traffic, we must redirect that traffic to allow the serversadequate room to shut down. Earlier in this section we described how to do this for HTTP, JMS, MQ,and BPEL traffic.

This step is illustrated below by the change in plugin-cfg.xml. As a result of the change, all HTTPrequests are routed to the cluster member on Node B.

Figure 7. Stop incoming traffic to cluster members on the node

a.

Stop cluster members that the application utilizes on the node.Be sure to stop the cluster members in the order given earlier in this section: application target, web,support, and messaging. For example, if your application is deployed to the application target cluster,first stop the application target cluster member. Then, since this is an application that usesasynchronous invocation, stop the messaging cluster member. If the messaging engine is currentlyactive on a cluster member on the node, it will failover to a different cluster member. This could takea few seconds or a few minutes, depending on your environment, but you need to wait for thefailover to finish before continuing.

We stop the cluster members because we do not want to process in-flight work while the module isbeing updated on the node.

The darkened boxes in Figure 8 show the cluster members that are stopped. Note that themessaging engine is now active on the messaging cluster member of Node B.

Figure 8. Stop cluster members on the node

b.

Enable node synchronization and restart the node agent. Trigger synchronization if the node isnot configured to synchronize at startup. Wait for node synchronization to complete before moving to

c.


11 of 17 19/12/2013 20:29

the next step.As a result of node synchronization, the node agent receives the changes from the masterconfiguration and updates the node's local configuration.

This step is shown in Figure 9 by the red line connecting the node agent to the deployment manager.v2 of the module is now present on the node's application target cluster member.

Figure 9. Enable node synchronization for the node

Start all cluster members that you stopped in step 3b.Now that the node contains the updated application, we can start its cluster members. Remember todo this in the correct order: messaging, support, web, and application target.

When the cluster members are started, the MDBs and schedulers are automatically started. Sincewe will re-enable HTTP traffic in the next step, there is no need to manually restart anything at thispoint other than the cluster members themselves.

As shown in Figure 10, there may be a small period of time in which the nodes are simultaneouslyrunning two different versions of the application. In this example, v1 and v2 of Module_4 aresimultaneously putting messages in the destination queue for Module_5. As long as the changes arecompatible as we described earlier, this will not be a problem. For example, if Module_5 needs to beupdated in order to be compatible with Module_4 v2, its new version must also be compatible withModule_4 v1.

Figure 10. Mixed module versions during transition period

To make this time period as short as possible, you should stop the cluster members on the next nodeas soon as this node's cluster members have started. If you script this, the transition can be made ina matter of seconds, assuming you only have two nodes. If the application needs to be updated onmore than two nodes, you have the option of waiting to start nodes running the new version until allnodes running the old version are stopped. In any case you should test running different versions

d.


12 of 17 19/12/2013 20:29

simultaneously before attempting this. If you find any compatibility issues, you should not use thisupdate procedure.

This step is depicted in Figure 11 by the green cluster members, which signify that they are nowrunning again. Note that the messaging engine is still active on Node B's messaging cluster member.If the cluster member on Node A were configured to be the preferred server, the messaging enginewould become active on Node A at this point.

Figure 11. Start cluster members on the node

If you disabled incoming HTTP traffic in step 3a, enable incoming HTTP traffic to cluster memberson the node.If there is another node that needs to be updated, you can combine this step with step 3a for the nextnode by properly crafting plugin-cfg.xml. This would help to minimize the time period in whichdifferent versions of the module are running at the same time.

Figure 12. Enable incoming HTTP traffic

e.

Repeat steps a through e for any remaining nodes on which the application has not been updated.f.Resubmit any failed events that may have occurred during the procedure. If your application utilizesstore-and-forward, forward any events that may have been stored.

5.

Update is complete!Figure 13. End state

6.


13 of 17 19/12/2013 20:29

Variation: Process Server fix pack installationBy extending the application update procedure described in the previous section, we can also approachcontinuous availability during WebSphere Process Server fix pack installation. The general approachremains the same: apply the update to one node at a time, routing traffic away from the node while the fixpack is being installed.

Note that in the case of an upgrade the health of the whole cell is at risk rather than the health of only oneapplication. Therefore, we strongly advise in this case not to attempt this procedure in a productionenvironment before testing it thoroughly in a similar test environment.

Also note that we have only tested this procedure when upgrading from V7.0.0.4 to V7.0.0.5, so wecannot guarantee that the same approach will be successful in other cases. One assumption we make isthat no database updates are required for the fix pack. This is because we will have a "mixed cell" atsome point during this procedure; that is, clusters will have members running different Process Serverversions at the same time. This could cause severe problems if there is a database incompatibility.

We will give you an outline of the procedure we used to achieve continuous availability during an upgradefrom V7.0.0.4 to V7.0.0.5. First read Special Instructions for WebSphere Process Server and WebSphereEnterprise Service Bus V7.0.0 Fix Pack 5 (V7.0.0.5) for official instructions to upgrade with minimumdowntime. The method we present here will be a modification of those instructions, essentiallyaccomplishing the same end result (an upgraded cell), but in a way that allows for near-continuousavailability. For each step, please refer to the corresponding step in the official instructions for moredetails.

Process Server fix pack installation procedureBefore starting the upgrade, the Process Server cell is in its normal operational state.To demonstrate this procedure we will use the deployment environment shown in Figure 14, whichconsists of a deployment manager node and two custom nodes, both running WebSphere ProcessServer V7.0.0.4. Because the nodes are hosted on separate machines, each can be upgradedindividually. At this point, all JVMs are running, including the deployment manager, all cluster members,and all node agents. Automatic node synchronization is enabled for both nodes, as represented by theblue lines between the node agents and the deployment manager. The messaging engine (ME) isactive on the Messaging cluster member of Node A. HTTP traffic is routed to all members of theAppTarget cluster, as shown by the orange dashed lines. For brevity we are only showing theapplication target and messaging clusters, but in reality there may be more clusters. We will show thisdiagram periodically as we move through the procedure, highlighting changes in red.

Figure 14. Beginning state

1.

Disable node synchronization and restart the node agents of all nodes.2.Stop the deployment manager.3.Install the fix pack to the deployment manager's installation root.4.


14 of 17 19/12/2013 20:29

Start the deployment manager.Figure 15. Fix pack installed to deployment manager

5.

For each node you wish to upgrade, complete the following steps, one node at a time:Stop all incoming traffic to cluster members on the node. This includes HTTP, JMS, MQ, and BPELtraffic. If there are still active HTTP sessions tied to cluster members, wait for them to close.

a.

Stop all cluster members on the node.b.Stop the node agent of the node.Figure 16. Cluster members and node agents stopped on the node

c.

Install the fix pack to the node's installation root.Figure 17. Fix pack installed to node

d.

From the deployment manager, run the profile upgrade script for each cluster that this nodecontains members of. This script should only be run once per cluster per cell, so if all clusters havealready been upgraded, skip this step.

e.

If Business Space is configured and the templates/spaces are hosted on a cluster member on thisnode, perform steps 1a and 1b of Updating Business Space templates and spaces after installing orupdating widgets. That article will also help you determine which cluster member the templates andspaces are hosted on. We recommend that the first node you upgrade is the one that hosts thetemplates and spaces.

f.

Enable node synchronization and restart the node agent. Trigger synchronization if the node isg.

6.


15 of 17 19/12/2013 20:29

not configured to synchronize at startup. Wait for node synchronization to complete before moving tothe next step.Start all cluster members that you stopped in step 5b.h.If you disabled incoming HTTP traffic in step 5a, enable incoming HTTP traffic to cluster memberson the node.Figure 18. Cluster members started, node synchronization enabled for the node

i.

Repeat steps a through i for any remaining nodes to which the fix pack has not been applied.j.Resubmit any failed events that may have occurred during the procedure. If your module utilizesstore-and-forward, forward any events that may have been stored.

7.

Fix pack installation is complete!8.Figure 19. End state

ConclusionBy carefully planning topology and properly crafting scripts that update business process solutioncomponents, you can greatly improve availability. This enables you to leverage BPM applications inenvironments where near continuous availability is required. In addition to the topology, we have outlineda set of application design guidelines and given procedures for updating various components of ProcessServer applications. While the techniques applied to approach continuous availability will need to bealtered to fit the specific installation and configurations that might exist in a particular organization, theoverall idea should be clear and well understood.

In the future, expect to see additional information on how we approach a similar level of availability usingIBM BPM V8 or later. When using IBM BPM V8 in an EAR-based "Process Server only" mode, theapproach and details provided in this article should apply equally well. In that environment, however, youwill have new challenges and new opportunities given the presence of the Process Center and theaddition of different types of authored artifacts contained in process applications and toolkits.

AcknowledgementsThe authors would like to thank Karri Carlson-Neumann for her reviews and suggestions for this article.


16 of 17 19/12/2013 20:29

ResourcesWebSphere Business Process Management V7 Production Topologies

Asynchronous Processing in WebSphere Process Server

IBM Business Process Management V7 Information Center

Understanding IBM HTTP Server plug-in Load Balancing in a clusteredenvironment

Using the store-and-forward feature in WebSphere Process Server v7.0

Special Instructions for WebSphere Process Server and WebSphereEnterprise Service Bus V7.0.0 Fix Pack 5 (V7.0.0.5)

developerWorks BPM zone: Get the latest technical resources on IBM BPMsolutions, including downloads, demos, articles, tutorials, events, webcasts,and more.

IBM BPM Journal: Get the latest articles and columns on BPM solutions inthis quarterly journal, also available in both Kindle and PDF versions.

Dig deeper into Business processmanagement on developerWorks

Overview

New to BPM

Technical library (articles and more)

Community

Downloads

developerWorks LabsExperiment with new directions insoftware development.

developerWorks newslettersRead and subscribe for the bestand latest technical info to helpyou deal with your developmentchallenges.

JazzHubSoftware development in thecloud. Register today and get freeprivate projects through 2014.

IBM evaluation softwareEvaluate IBM software andsolutions, and transformchallenges into opportunities.

DownloadDescription Name Size

Scripts for use with this article scripts.zip 4KB


17 of 17 19/12/2013 20:29

Documents

Approaching continuous availability in WebSphere Process ... · This article provides background, insights, and a pragmatic set of techniques for installing application updates and