NM functions Configuration, Performance, Fault, Accounting, Security

Preview:

Citation preview

NM functions

Configuration, Performance, Fault,

Accounting, Security

Configuration Management• Middle and long range activities for

controlling Physical, electrical and logical inventoriesMaintaining vendor files and trouble ticketsSupporting provisioning and order processingDefining and supervising service level

agreementsManaging changesDistributing software

• Configuration management is central to all other network management functionsAll other management are supported by

configuration details Enhances control over configuring the network

and devices Quick access to vital configuration data Helps initialization, maintenance and shutdown

of individual components and logical subsystems

Primary Information• Actual configuration • Attributes of network elements • Generated configuration • Status indicators of network elements • Vendor data • Change requests and record • Order data • Actual inventory • Status of service-level indicators

Secondary Information • Traffic Volumes

• More details on indicators

• Performance indicators of the network elements

• etc

Configuration management functions

• Inventory management

• Network topology services

• Service Level agreements

• Designing, implementing and processing trouble tickets

• Order processing and provisioning

• Change Management

Inventory management• Automated inventory – online record of

currently implemented components and spares, contact vendors, location of components, maintenance requirements for certain

equipment classes, service statistics like

• number of outages, • response for repair, • repair time distribution

Good Inventory Management• less redundancy

if same information is stored in different data bases- wastage of resource, processing time to back up the data bases

• synchronized change management • unique names and addresses

Helps during troubleshooting

• Efficient troubleshooting• Better capacity and contingency planning

Network Topology Services• Requires current and historical

configurations

• Layered configuration displays at network and component level of Electrical layouts PhysicalLogical

Display of configuration details

Network Backbone T1

T1/T3

T3

T3

Network details – click on icon

node Network details

M

M

M

M

Protocol level

PHY

DLC

Protocol

level

Auto Discovery tool• Auto- discovery tool can discover devices

on the network ( periodically)

• Auto mapping produces the network map

• Takes up bandwidth to execute all this

SLA• Need to evaluate long-term service levels• Consistency in customer service level• Increased planning and decreased crisis

management • Service levels

Responsiveness, accuracy, availability

• Performance reporting Planned and actual workload characteristics

and service levels during report period

trouble tickets • Linking trouble-tickets • Information in a trouble tickets

Time reportedTime received by responsible groupTime network service restoredTime vendor notifiedTime vendor respondedTime vendor restored serviceTotal vendor timeTotal user non-availabilityTotal service outage

Change Management

User request

Study Impact

Plan Change

Schedule

Request OK

Execute

Document

Configuration and inventory database

Tools for configuration management

• Simple tools Provide simple storage for all network related

information Manually collecting and entering data

• Complex tool Automatically gather data – latest information on

configuration Compare current configuration with stored conf Change a device’s configuration while running Specify configuration errors that should generate

warning messages –

Performance Management• Activities required to continuously evaluate

principal performance indicators to check Service level maintenance Identify potential bottlenecks Establish trend reports Network utilization and error rates

Contd..• Involves

Collection of data on current utilization of network devices and links

Analyze data to discern high utilization trendsSetting utilization thresholdsUsing off-line simulation and or analytical

studies on how to maximize performance

Primary Information• Actual Configuration • Generated configuration • Performance indicators in real-time or in near-

real-time Response time Congested channels Resource utilization

• Selected vendor data • Performance histories for selected facilities • Operational procedures

Performance Indicators• Availability• Response time• Throughput• Utilization – channel occupancy• Grade of service• Transmission volumes• Offered load • Accuracy

Indicators• Service oriented indicators

Have priority

• Efficiency oriented indicators

Service Oriented Indicators • Availability

Customers perspective depends on technical reliability of componentsRedundancy?

• Cost benefitTotal Costs = costs of redundancy + cost of

cosnequences

AvailabilityMTBF

__________________________________

MTBF+MTTD+MMTR+MTOR • MTBF – Mean time between failures• MTTD – Mean time to diagnose• MTTR- Mean time to Repair (or report)• MTOR – Mean time of Repair • Better Availability, keep MTTD, MTTR, MTOR

low,

Response Time

• Propagation Delays, Processing delays, Transmission delays, Protocol delays

User

System think time

enter time

network System response

time

output response time

End user response time

Contd..• Total Response Time

• Network Delays

• Processing delays

• Protocol delays – time outs

• Response time consideration depend onProtocols and their behaviorJob prioritiesLoads in the system

Accuracy• Accuracy can be affected by

Erroneous transmission (wireless & fiber)Characters transmitted but not deliveredCharacters received which were not sentCharacters duplicated

Residual Error RateCHE+CHV+CHN+CHD

______________________________

CHT

• CHE = erroneous characters due to media & processing

• CHV = transmitted but not received

• CHN = extra characters received

• CHD = duplicated characters

• CHT = total characters

Efficiency oriented indicators• Efficiency oriented indicators - Represent

interest of the organization

• Service oriented monitoring and and efficiency oriented monitoring conflicts?

Efficiency vs service

CPU Busy

Channel Busy

Line Busy

Service L3

Service L2

Service L1

30% 40% 70% efficiency

serv

ice

Throughput• Measure of a server’s capacity - MIPS

• Line throughput – kilobits/sec

• Application oriented Number of transaction / unit timeNumber of customer sessions per applicationNumber of calls servicedNumber of jobs provided by a node

Utilization • Dynamic measure of resources used

• Puts a practical limits on the throughput under operational conditions

• Helps study overlap among component processing, mutual waits etc.

Utilization • Utilization vs Accuracy

• Utilization vs throughput• Utilization vs Goodput

Lin

k ut

iliza

tion

Err

ors

per

seco

nd

Time in seconds Time in seconds

100 10

Overlap effects

Input Subsystem

Output Subsystem

CPU Output Link

Slow link?

Availability• Availability of system depends on

availability of individual components (Very difficult to measure and report on

availability)Check on each component and compare with

configurationDepends on how components are connected

Example

• Each Component availability = 0.98

• Availability of the serial combination is 0.98 * 0.98 = 0.96

Example : 2 modems . Serial processing of data

A A

Configuration 1

• Prob 1 link is not available = 0.02• Prob both links are no available is

0.02 * 0.02 = 0.0004

• Availability = 1- 0.0004 = 0.9996

A

A Configuration 2

Performance measurements• Data Gathering

Exhaustive Statistical

• Distribution for sampling times

• Correlation effects

• Performance AnalysisData presentationInterpretation

Contd..• Historical trends

• Real time trends

• Graphical presentation and comparison

• Linking different performance indicatorsThen set thresholds

Simulation studies

• To improve the performance or identify bottlenecks – model the network and components – (primary)Study effects of changes in the modelTarget Optimal performanceRequires Synthetic traffic generation

• Analytical and simulation tools

Simple tools for PM• Provides real-time information on network

componentsGraphical – bars, histograms

• Can help find bottlenecks• Main information

Processor utilizationMemory utilizationLink – pkts/sec, bits/sec Bit error rates

Complex Tools• Set threshold

• Take action once thresholds exceedAlarm Enable backup

• Near threshold warning

• Store historical daya

A complex tool at work• Performance problem

• Brief periods on interrupted service between systems – no information passes through –3 pm and 12 am

Daisy Gatsby

Mainframe

PM tool at work• Check error rates in the network

Normal

• Check utilizationPeaks at 3pm and 12 am – times of back up

• Check Gatsby and Daisy utilizationPeaked to 100% at the specified times

• Check for processor intensive applicationsnegative

Contd..• Check network traffic type

Located an unknown protocol packetFlooding the network – locating serversCheck originatorSend message to himOr block his traffic

Fault management • Activities needed to dynamically maintain

the network service level

• High network availability

Primary Information• Actual configuration• Generated configuration• Event reports and alarms• Status indicators of network elements• Performance indicators• Spare components and their status• Backup routes and their status• Vendor data for problem dispatch• Global traffic volumes• Progress of trouble resolution

Steps in FM• Identify the occurrence of fault

• Isolate the cause of fault

• Correct the fault if possible

• First is difficult, second is very difficult!

Network Status Supervision• Layered configuration maps (status)

(Tightly coupled to topology display)

• Zoom in on parts to isolate problems• Real time traffic status displays• Good monitoring devices/sensors• Monitored information to be passed on to

agents, or management elements • Process and distribute messages, events and

alarms

Status• Is a measurement of the behavior of an object at a specific

instance in time Represented by a set of status information items and

their values at a specific time Network

Status Element Status

CSU1 down

CSU2 down

No Carrier

Element 0

Element 1

Element 2

Event• Change in the status of the element – which justifies

notification i.e. significant to fault management• Event report can be generated

Type of eventChange in statusTime stampReporting entity -Object or process that generated eventManaged object whose status changedManaged object informationProbable causeEffect of event on the managed object

Event Filtering• Multi-layered filtering

E

E

E E E E

E E

E

P

E

E E

1

2

3

Activity on Network

Threshold Filter

Grouping Filter

Prioritizing Filter

Prioritized problems

Filtering Process

time

Bit

err

ors

Investigated, no action

investigated

Action taken

Action effective

Filtering Process• Global filtering

First process on an event – is the event serious and does it have to be processed

Use a set of criteria for this assessmentCan not be function specific

Filtering Process• Distribution Filtering

An event processor selects the event it wishes to receive

There are various event processes running simultaneously

• Event process filteringFiltering done by the event processorSpecific to the functional

Event Processor• Examine and process event reports

• Passive processingSampling and logging

• Proactive processingTakes automatic corrective action

Process for filtering

Event Distribution Unit

Event Reports

Q Distribution Q

Q

Q Event Processor

Event Processor

Event Processor

Distribution Subscription

services Global Filtering

events

Event effect• Permanent – external action required

• Temporary – will correct automatically

• Impending – will result in failure soon

• Impaired – services can be provided at reduced levels

• Inhibited – services stopped

Dynamic Troubleshooting• Opens trouble tickets, links them, dispatches to the

proper vendors, checks on-line progress of trouble tickets

• Problem detection – Is something wrong?

• Problem determinationWhat is wrong and where is the problem in the

network?

• Problem diagnosis & resolutionTo isolate, fix or provide backup and fix

End-to-end testing• To verify dynamically correct network

operationConducted during normal network operation,

without affecting it

• Can we have over-head free testing?• What components should be tested?• How should tasks be assigned?

Local sitesCentral sites

Contd..• When to monitor and test?

Continually, periodically, on demand

• How to monitor and testDisruptive, non-disruptive

• What indicators to monitor and test?Service level, efficiency, loops, circuits

• What instruments to use?Hw, sw, analog, digital

• What reports are to be generated?Standard, adhoc with special evaluations

• What are the triggering events?Time, single or combined events, alarms

Types of faults• Unobservable

Deadlocks between processesInstrument not capable of recording the events

• Partially observableNode failure – actual reason – low level

protocol

• Uncertainty in observationLack of device response

• Device is down, network partitioned, congestion delays, local timer faulty

Issues in isolating faults• Multiple potential faults

Number of elements failing

• Too many related observationsOne fault manifests itself as various events

• Interference between diagnosis and local recovery proceduresError recovery sets in before diagnosis

• Absence of automated tools

Example FM• Problem scenario – sergeant fails due to buffer overflow

Sergeant

LAN2

Pepper

Network Management System

LAN3

LAN1

Contd..• Buffer is sergeant is well provisioned for

Fails due to traffic surge

• Pepper reports link failure to LAN3Message sent to NM system

• NMS asks pepper to check on carrier presence in Link to LAN3Carrier Absence reported

• NMS ask Pepper to perform loopback on link3ok

Contd..• NM resets Sergeant

• ?

• Actual reason for failure not identified

• This could have been avoided if there was an event from sergeant of utilization in excess of 80% or 90%

Simple tool• Points out problem existence

Eg ICMP ping tells you about the existence of a system

• Complex tool may perform all functions shown in the previous example

Recommended