13

Mark Jones Senior Product Manager How Automation Can Help You: Use Cases for NetIQ Aegis™

Embed Size (px)

Citation preview

Page 1: Mark Jones Senior Product Manager How Automation Can Help You: Use Cases for NetIQ Aegis™
Page 2: Mark Jones Senior Product Manager How Automation Can Help You: Use Cases for NetIQ Aegis™

Mark JonesSenior Product ManagerMark JonesSenior Product Manager

How Automation Can Help You: Use Cases for NetIQ Aegis™How Automation Can Help You: Use Cases for NetIQ Aegis™

Page 3: Mark Jones Senior Product Manager How Automation Can Help You: Use Cases for NetIQ Aegis™

Our Vision For IT Process Automation3 Years In The Making

3 years ago NetIQ had a vision for converging our systems & security management products to support consolidated incident & event handling.

But customers said, help us connect to our other tools as well. We’re the Noah’s Ark of tools – we have two of everything.

VP of Operations at a major Financial Institution

So we altered our plan to give customers greater control of the tools they’ve already invested in by creating a strategy for heterogeneous IT Process Automation (ITPA).

Page 4: Mark Jones Senior Product Manager How Automation Can Help You: Use Cases for NetIQ Aegis™

Introducing NetIQ® Aegis™The Control & Automation Platform for IT Processes

NetIQ Aegis is a software platform that models, automates, measures and improves run books and ITIL-based processes, bringing control and automation to IT Operations

ITILProcess(macro)

Run Books(micro)

Automate

Model

Measure

Improve

Page 5: Mark Jones Senior Product Manager How Automation Can Help You: Use Cases for NetIQ Aegis™

Use Case #1Sympathetic Event Correlation

NetIQ AegisNetIQ Aegis

4. AppManager receives sympathetic access failure events

• From application and web servers5. Aegis’ correlation engine

sees the sympathetic events• And matches them to pre-defined

rules

2. AppManager receives event• From the agent on the server

1. SQL Server down event

6. Aegis closes the sympathetic events

• Reducing the volume of AppManager events to be dealt with

• Update comments in the original event accordingly

3. AppManager event triggers an Aegis workflow

• Correlation engine begins listening for sympathetic events that match rules

NetIQ AppManagerNetIQ AppManager

Database Database ServerServer

Web ServerWeb ServerApplication Application

ServerServer

! ! !Additional correlation examples:

• Suppress machine down events from hosts on attached subnets when a router fails

• Identify root cause from multiple events, e.g. a congested network segment identified by a combination of Network ResponseTime events, and high queue lengths on some Exchange servers

1

2

3

4

5

6

Page 6: Mark Jones Senior Product Manager How Automation Can Help You: Use Cases for NetIQ Aegis™

Use Case #2Managing Maintenance Modes

NetIQ AegisNetIQ Aegis

4. Aegis sets the maintenance mode in AppManager

• On the right machine at the right time

6. Aegis’ sends a reminder email before the expiration of maintenance

• With an opportunity to “snooze” or extend via email

2. Aegis receives the email and parses

• Identifies the resource to set maintenance mode on and the time window

1.Application owner sends an email request to set maintenance mode

• Using an Outlook form

7. Aegis stops maintenance mode

• On time with no further approval

3. Aegis sends a reminder email before the start of maintenance

• With an opportunity to cancel via email

NetIQ AppManagerNetIQ AppManager

5. Administrator performs maintenance

Application Application OwnerOwner

Outlook FormOutlook Form

1

23

4

6

5

7

8. Aegis sends email confirming maintenance stoppage

8

Page 7: Mark Jones Senior Product Manager How Automation Can Help You: Use Cases for NetIQ Aegis™

Use Case #3Low Disk Space Response

3. Aegis requests disk usage analysis from AppManager

• Identify top N culprits by folder, file type, age

• Extra attention on known temp file storage areas

4. Aegis sends email to admin requesting approval to clean up

• Embed results of disk usage analysis & link to Aegis web site

2. AppManager detects condition• AppManager Knowledge Script generates

event

1. Available disk space falls below threshold

• Likely caused by temp file growth

5. Administrator approves partial cleanup through Aegis (or by replying to email)

• Admin can select individual folders or file types for deletion, archiving or user attention

6. Aegis commands AM to perform cleanup

• Delete approved files and analyze new disk space status

7. Aegis sends confirmation email to admin

• Identify files deleted and new disk space status

NetIQ NetIQ AppManagAppManagerer

NetIQ AegisNetIQ Aegis

AdminAdmin

AppManagAppManager Agenter Agent

ArchiveArchive TrashTrash

1

2

3

4

5

6

7

Page 8: Mark Jones Senior Product Manager How Automation Can Help You: Use Cases for NetIQ Aegis™

Use Case #4VM Dynamic Performance ManagementNetIQ AegisNetIQ Aegis

9. Verify improved service performance

• Repeat as necessary for up to 3 new guests total

4. Provision new VM guest• Clone VM, configure LAN settings,

etc & boot5. Apply post-image updates

per corp standard• Patches, configuration updates since

VM image was created

2. Identify VM host with spare capacity

1. Detect poor performance on VM-hosted service

• Performance problem detected by AppManager ResponseTime

6. Configure applications• Machine-specific settings required

on guest and other machines in business service

7. Validate application function• Verify proper application function

before bringing into production

8. Bring new guest into production rotation

• Configure load balancer, application controller or similar

VMWare VMWare Virtual CenterVirtual Center

Attachmate Attachmate WinInstallWinInstall

Load Balancer Load Balancer or Controlleror Controller

VMware VMware ESX HostsESX Hosts

3. Gain approval to provision new VMs

• Send email to admin with proposed changes, requesting approval to automatically respond NetIQ NetIQ

AppManagAppManagerer

AdminAdmin

Critical Business

Service

1 2

3

4 5

6

7

8

9

Page 9: Mark Jones Senior Product Manager How Automation Can Help You: Use Cases for NetIQ Aegis™

Use Case #5Web Server Sequential Restart

3. Aegis blocks new sessions to first server

• Uses NetIQ AppManager to configure load balancer4. Aegis commands AppManager to

monitor for server to reach zero active sessions

• Users “bleed” off as they end their sessions on their own; AppManager sends event when zero session remain

2. Admin initiates “Restart Web Farm” Runbook

• Customized runbook automated by Aegis

1. Admin applies a patch to all web servers

• Reboot needed to finalize

5. Aegis commands AppManager to restart the web server

• Aegis waits for notification that reboot is complete

6. Aegis commands AM to test basic functionality

• Verify that web server properly performs expected duties7. Aegis enables new sessions to the

server• Uses NetIQ AppManager to configure load

balancer

NetIQ NetIQ AppManagerAppManager

NetIQ AegisNetIQ Aegis

AdminAdmin

AttachmateAttachmateWinINSTALLWinINSTALL

Active Active SessionsSessions

Web ServersWeb Servers

Load Load BalancerBalancer

8. Aegis verifies web site health• Users are accessing the rebooted server

successfully and no Response Time or other errors reported on the web farm

9. Send progress notification to Admin

• Include % remaining & ETA for completion10. Go to Step 3 for next server

• Iterate until all servers completed

1

2

3

4

5

6

7

8

9

10

Page 10: Mark Jones Senior Product Manager How Automation Can Help You: Use Cases for NetIQ Aegis™

Use Case #6Incident Management

Other Sources Other Sources (RFCs, CMDB, (RFCs, CMDB, NetIQ Change NetIQ Change Guardian, etc.)Guardian, etc.)

3. Create helpdesk ticket• Apply proper classifications• Embed link to web page with related incidents

4. Helpdesk staff works ticket• Relevant information already collected & presented

with ticket

2. Collect related events from other data sources

• Changes, tickets, intrusions, etc during same time period

• Broaden scope to other machines in business service and correlate

1. Incident occurs• Performance problem detected by AppManager

ResponseTime

5. Monitor existing incident management workflow

• Support ticketing workflow with Aegis Investigation Assistance

• Wait for ticket to be resolved (not closed)

6. Initiate Incident Probation Period• Verify proper service restoration, record in ticket• Search all tools for unanticipated downstream

impacts, reopen ticket if found

7. Coordinate post-incident review for Problem Management

• Request explanatory info from stakeholders, e.g. how well was incident handled, how to prevent recurrence

• Produce unified report for management

NetIQ NetIQ AppManagAppManag

erer

HelpdeskHelpdesk

NetIQ AegisNetIQ Aegis

Incident Incident StakeholderStakeholder

ss

ManagemenManagementt

TicketingTicketing

1

2

3

4

56

7

Page 11: Mark Jones Senior Product Manager How Automation Can Help You: Use Cases for NetIQ Aegis™

Use Case #7Change Management

AppManagAppManagerer

8. Correlate changes to impacts• Search other tools for downstream impacts

from change such as performance problems, new vulnerabilities, etc.

All Data All Data Sources (Net. Sources (Net.

Mgmt, Etc)Mgmt, Etc)

4. Change Requester executes change per approved ticket

• Actions bounded by change control tool

1. Change is requested & approved • via existing “Request for Change” process

6. Reconcile audited changes to the approved RFC

• Group audited changes by time, machine, individual

• Request review of changes: auth or unauth, relevant ticket ID, etc

• Update ticket and CMDB with related changes

7. Perform system health check• After change, verify proper service levels

““Request for Request for Change” Change” ProcessProcess

NetIQ AegisNetIQ Aegis

Change Change RequesterRequester

ManagementManagement

9. Coordinate Post-Change Review• Change is “completed” but not “closed” until

the CAB has completed review

Tripwire, Tripwire, NetIQ Change NetIQ Change Guardian, etcGuardian, etc

AdministratorAdministrator

2. Detect approved change request• Monitor Remedy or other change management

system

5. Change audit tool detects actual config changes

• Tripwire or NetIQ Change Guardian

NetIQ NetIQ Change Change

AdministratoAdministratorr

CMDBCMDB

3. Provision access in change control tool

• Managed by NetIQ Change Administrator

Incident Incident StakeholdersStakeholders

1

2

3

45

6

7

8

9

Page 12: Mark Jones Senior Product Manager How Automation Can Help You: Use Cases for NetIQ Aegis™

Use Case #8Vulnerability Management

8. Relate changes to impacts• Search other tools for downstream

impacts from change such as performance problems, new vulnerabilities, etc.

All Data All Data Sources (VM, Sources (VM,

SM, Etc)SM, Etc)

3. Request permission to remediate via existing Change Management process (RFC)

• Group by machine, service, vulnerability class, etc.

1. Initiate vulnerability & policy violation scan

• Or scan on an existing schedule

5. Initiate remediation• Using provisioning tools such as

WinINSTALL, SMS, etc. or by assigned administrator

7. Perform system health check

• After change, verify that remediation did not impact service levels

AppManagAppManagerer

RemedyRemedy

NetIQ AegisNetIQ Aegis

Secure Secure Configuration Configuration

ManagerManager

AdministratorAdministrator

2. Identify resulting vulnerabilities

4. Monitor for approved RFC Patch Patch Manager, Manager,

WinINSTALL, WinINSTALL, SMS, EtcSMS, Etc

6. Initiate vulnerability scan to verify remediation

• Verify that vulnerability was indeed remediated

9. Close change request• Or escalate if impacts are found

1

2

3 4

567

8

9

Page 13: Mark Jones Senior Product Manager How Automation Can Help You: Use Cases for NetIQ Aegis™