20
ITI.COM SHARE REGISTRATION + RETAIL INVESTOR SERVICES + EMPLO Architecting Mission Critical Applications Don’t forget the Instrumentation! ike Jolliffe hief Technology Officer

Architecting Mission Critical Applications Don’t forget the Instrumentation!

  • Upload
    hanh

  • View
    45

  • Download
    2

Embed Size (px)

DESCRIPTION

Architecting Mission Critical Applications Don’t forget the Instrumentation!. Mike Jolliffe Chief Technology Officer. Overview of Equiniti. Market Leader in UK Share Registration Services Partnering around 57% of the FTSE 100 & 40% of FTSE 250 Manage over 24 million shareholder accounts - PowerPoint PPT Presentation

Citation preview

Page 1: Architecting Mission Critical Applications Don’t forget the Instrumentation!

WWW.EQUINITI.COM SHARE REGISTRATION + RETAIL INVESTOR SERVICES + EMPLOYEE BENEFITS

Architecting Mission Critical Applications

Don’t forget the Instrumentation!

Mike JolliffeChief Technology Officer

Page 2: Architecting Mission Critical Applications Don’t forget the Instrumentation!

2

Overview of Equiniti

+ Market Leader in UK Share Registration Services

+ Partnering around 57% of the FTSE 100 & 40% of FTSE 250

+ Manage over 24 million shareholder accounts

+ Offices in London, Worthing, Birmingham, Bristol, Edinburgh and Jersey

+ Separated from Lloyds TSB Group on October 1st 2007after 50 years

+ Emphasis on growth

Page 3: Architecting Mission Critical Applications Don’t forget the Instrumentation!

3

What makes an application ‘Mission Critical’?

A Business dependency on that application for core business activity

Tests such as these may help you determine the importance of the application:-

+ Would the business survive a major outage of this application?

+ A need for “high-9’s” availability over a normal processing period

+ Regulatory drivers for the applications availability – such as Crest settlement capability

Page 4: Architecting Mission Critical Applications Don’t forget the Instrumentation!

The Goals of this Presentation

+ To highlight that whilst there is focus on infrastructure availability, there is not always the same degree of attention to application stability

+ To make a case for investing in the design effort for planning for application failures.

+ To show that understanding the User’s perspective on the application’s behaviour helps both the successful development of the application, and all of the ongoing support effort

4

Page 5: Architecting Mission Critical Applications Don’t forget the Instrumentation!

5

Our Mission Critical Application……….

For Equiniti, a system we call Sirius is at the core of what we do for our Share Registration and Employee Share Scheme clients. This is our Mission Critical Application

+Started in 2003, live from Q1 2006, as a £40m project to re-engineer our processes and replace our aging OpenVMS systems. This was always going to be more than a straight application rewrite.

+The development was led by Accenture, with HP and Microsoft providing Hardware and Software resources respectively

+To date this is over 140,000 man days of effort, producing over 2m lines of code, and 2500 classes. We continue to extend the system for new capabilities as we as a business expand – this is not a static application!

+ Integrated custom workflow, Imaging, real time work prioritisation.

+Technology stack is Windows 2003 R2, .Net Framework 3.0, SQLServer 2005. We started out on Framework 1.1 and SQLServer 2000.

Page 6: Architecting Mission Critical Applications Don’t forget the Instrumentation!

6

What makes a ‘Mission Critical Application’ successful

Application Design characteristics such as:-

+ Componentisation / Abstraction delivering Clearly defined interfaces and Service boundaries

+ Interoperability through those services to other internal or external systems e.g. Integration to Call Centre or Website technologies to re-use functionality

already developed for one channel through other delivery mechanisms

+ Flexibility and adaptability of the actual application to changing business needs

Infrastructure Design & Non-functional characteristics such as:-

+ Design for Availability+ Design for Scalability + Processing Performance+ Data Integrity – i.e. No committed transactions could be lost as part of a system failure

Page 7: Architecting Mission Critical Applications Don’t forget the Instrumentation!

7

So you have a ‘Standard’ application…..e.g. Sirius

C# / ASP.Net UI for Internal business users

C# classes for business logic and data access via Genome (ORM+)

+ORM – Object Relational Mapping

WCF based web service call

SQLServer 2005

Internal UI is just one Channel - any Channel can use the same web servicesTakes 1.4 million hits per day on average

Database of 2TBSome tables partitioned for sizeSome tables partitioned to achieve data deletions

The application exposes web services to be consumed by different channels

Over 2500 classes each providing methods to achieve specific business functions

Web (any channel)

App

Database

Classic 3-Tier

Page 8: Architecting Mission Critical Applications Don’t forget the Instrumentation!

PassivePassivePassive

8

Sirius is physically deployed like this

Web

App

Active

Web

App

Web

App

Web

App

Web

App

Web

AppAppApp

X

NLB

NLB

Users

Cluster

+SAN

storage

Page 9: Architecting Mission Critical Applications Don’t forget the Instrumentation!

9

Sirius is physically deployed like this

Page 10: Architecting Mission Critical Applications Don’t forget the Instrumentation!

10

Sirius is physically deployed like this

SAN storage

SAN storage

Data Centre 1 Data Centre 2

3rd location

Resilient High speed fibre network

Trian

gulat

ion fo

r DR

Triangulation for DR

Synchronously mirrored SAN storage

Mission Accomplished!

NLB

NLB

Page 11: Architecting Mission Critical Applications Don’t forget the Instrumentation!

11

Is the Application as resilient as the Infrastructure?

Applications must be architected to be as resilient as the Infrastructure – to highlight when it fails and what caused it.

+ Do you architect into the application, from the outset, the basic needs of fault diagnosis?

+ You measure infrastructure resilience on the time to recover from an outage, if it’s even detectable by the end user. Do you do that for your Application?

+ It is not about writing endless logs (but the quality of log entries on a failure is important).

+ It is about instrumentation in your application that tells you in near-real-time what’s happening.

+ For a successful Application you need to be able to:-+ Detect there is a problem (before the users flood the service desk with calls)+ Restart failed services &/or Recover the damage that might have been done by a failed

process.

Page 12: Architecting Mission Critical Applications Don’t forget the Instrumentation!

The End User Perspective

+ Users see ‘System Availability’ as their “ability to use the application” – which is not the same as the infrastructure being up and running

+ It means that the application must be up, running, and performing fast enough for them to get their work done.

+ Before you start architecting the solution ensure you understand what your users expect you to achieve in terms of availability and performance – as to them they tend to be one and the same thing. If it goes slowly it can be almost as bad as not being available at all.

+ Determine ahead of the development what the impacts of failure will be – this helps drive the right architectural and non-functional requirements for a Mission Critical App.

+ The cost of downtime – lost revenue (£’s)+ Reputational damage – financial impact (£’s)+ Regulatory breaches & potentially financial penalties (£’s)

12

Page 13: Architecting Mission Critical Applications Don’t forget the Instrumentation!

Steps to take in the application

+ As Architects you must consider from the start how errors will be handled within the application and ensure that development standards reflect your decisions

+ Developers must implement proper error trapping, and make informed decisions with how they raise that error, and the degree of criticality.

+ Should ‘retries’ be coded in the app (such as a timeout during a cluster failover)+ Should the error be raised to the calling process / written to the Event Log to allow

a graceful failure?

+ What goes into the Event Log must be meaningful and complete+ Unique error description and number – this allows tools such as System Centre to

pick up the error.+ Have pre-defined actions configured for System Centre wherever the corrective

action is clear from the error code. Treat changes/updates in these actions as part of future code releases so they get deployed with application patches that might change the recovery action.

+ Ensure precise details of failing component are recorded, including call stack

13

Page 14: Architecting Mission Critical Applications Don’t forget the Instrumentation!

Steps to take in the application

+ In the case of Windows Services especially, build in support for using WMI to monitor the service.

+ Over and above any monitoring tool output, consider what reports you can provide Service Management with that will give early warning of problems such as performance degradation.

+ In addition to any application specific tables you can analyse, some other great sources of information come for ‘free’+ IIS logs – Load to a database every 60 seconds via a SQLAgent job and the Logparser

tool to get a picture on interactive page performance+ SQLServer 2005 Management Views for query performance and resource utilisation+ Infrastructure performance data from Perfmon or WMI calls to show hotspots as they

occur

+ These types of reports tell you about the Application, but they also tell you about how your users make use of your Application. This feeds into planning for infrastructure, support and enhancements

14

Page 15: Architecting Mission Critical Applications Don’t forget the Instrumentation!

Sirius status reports – End user performance

An analysis of the IIS weblogs from each webserver, imported into a database and displayed via Reporting Services

15

Page 16: Architecting Mission Critical Applications Don’t forget the Instrumentation!

Sirius status reports – End user performance

Combining the interactive response with a graph that shows background processes allows correlation of performance dips to tasks that may be causing them and hence allowing better scheduling

16

Results for 26-mar-2009 11:30:00 To 26-mar-2009 12:15:00Sirius R2+ Transaction System Performance Graph (Mon-Fri) Extended

INTRADAY (minutes)

Page 17: Architecting Mission Critical Applications Don’t forget the Instrumentation!

Sirius status reports – Performance by Transaction

Analysis of the performance by page name is used to highlight those pages that fall outside performance expectations and allows prioritisation of development resource to tune that process. This report supports drilling down through multiple levels to see specific details

17

Page 18: Architecting Mission Critical Applications Don’t forget the Instrumentation!

Final Thoughts

+ Plan for the Application failing in the same way that we already plan for hardware / networks failing. Get a framework for error management in place and document the ‘big’ scenarios, you won’t catch all of the smaller ones in design. Then tune the error management process during testing

+ Architect-in the needs of the support teams who will have to diagnose and fix application failures. If they don’t have the information they need recorded by the failure event, then the time to rectify is greatly extended.

+ Before you start architecting the solution ensure you understand what your users expect you to achieve in terms of availability and performance an all the consequences for not achieving these requirements

+ Share your findings about the application usage with the business end users – this can help them change their work patterns, process flows etc to maximise the systems potential

18

Page 19: Architecting Mission Critical Applications Don’t forget the Instrumentation!

References

+ Systems Centre+ Home page

Http://www.microsoft.com/systemcenter/operationsmanager/en/us/default.aspx

+ SQLServer Reporting+ IIS Reports starter pack

http://www.microsoft.com/downloads/details.aspx?FamilyID=2805D337-14C7-40E3-820B-E7EE653C68C0&displaylang=en

+ Contact details [email protected]

+ Shareview – the Shareholder & Investor portal + http://www.Shareview.co.uk

19

Page 20: Architecting Mission Critical Applications Don’t forget the Instrumentation!

20