Upload
erika-riley
View
218
Download
0
Embed Size (px)
Citation preview
11
Design and Implementation of TWAREN Hybrid Network
Management System
National Center for High-Performance ComputingSpeaker: Ming-Chang Liang & Li-Chi Ku
22
Outline
IntroductionMotivationIssuesDesignImplementationFuture works
33
INTRODUCTION
44
About TWAREN
TWAREN (TaiWan Advanced Research & Education Network) network construction was completed at the end of 2003 and started its operation and service in the beginning of 2004.
In its initial phase, IP routing was the main service provided.
The network management programs coming along with the purchase of network equipments, including CIC, Webtop, CW2K, HP Openview, HP NNM and other solutions.
55
Initial phase of TWAREN
Taipei
Taichung
TainanHsinchu
ASCC
NDHU
NCTU
NTHU
NCHU NYSU
NCU
CCUNCU
NCCU
TWAREN
10GE
STM-64/OC-192STM-16/OC- 48GE
C7609
C6509
C6509
C6509
C6509
C6509 C6509
C6509
C6509
C6509
GSR
GSR
GSR
GSR
NTU
C6509
NHLTC
C6509
NTTU
C6509
EBT10GE
MOECC
C6509
66
Initial phase of NMS
3750 2600 15454 1560025227609NAM
CW2K(DFM)
NNM CTM
Cisco InfoCenter
RemedyHelp Desk
ISM
Notification
12416
TrapPING
PollingTrap
PING
Polling
PING
Polling
Trap
Trap
Trap
CLI
CLI
API
Gateway
Probe
HTTP
FTP
SMTP
DNS
WebTop
77
Phase 2 of TWAREN
TWAREN was adapted for more protection methods and better availability at the end of 2006, called TWAREN phase 2.
Tens of optical switches and hundreds of lightpaths were then served as the foundation of the layer 2 VLAN services and the layer 3 IP routing services.
In 2008, tens of VPLS switches were further incorporated to provide additional Multi-point VPLS VPN service.
The layer 1 lightpaths can be protected by SNCP, layer 2 VLAN by spanning tree recalculation and layer 2 VPLS by fast reroute technology.
All these improvements transform TWAREN phase 2 into a true hybrid network capable of providing multiple layers of services and high availability .
88
Architecture of TWAREN phase 2
STM64
STM1610GE
GE
6509 7609
15454
NTU
6509
7609
15454NCU
6509 7609
15454
NSYSU
6509
7609
15454 NCHU
6509
7609
15454NCTU
6509
7609
15454NTHU
7609
15454
ASCC
6509
NCCU
6509 7609
15454
NCKU
6509 7609
15454
CCU
7609C15454
Taipei
15600
12816
12816 NCHC
7609C
15454
15600
1281612816
MOEcc
Hsinchu
7609C
15454
15600
12816 12816
NCHC
Tainan
7609C 15454
12816
12816NCHC
Taichung
7609
NCNU
7609
15454
NIU 6509
7609
15454
NDHU
3750
65093750
NHLTC
65093750
NTTU
99
MOTIVATION
1010
Why need new NMS? The architecture of TWAREN phase 2
became more and more complicated. Since TWAREN phase 2 has more protection
methods, a single point of hardware or circuit failure will not interrupt the service level provided to the end users.
The initial phase of NMS was no longer competent for the hybrid network anymore because it is hard to determine and predict the correlation between failures and affected services.
1111
Requirements for new NMS Automatically determine the correlation
between failures, affected services, affected customs and severity level on this highly safeguard network.
Provide single integrated visual user interface. Use integrated database, logs, message flows
and exchange protocols. After several surveys, we decided to develop
a new NMS which be suitable for monitoring all services provided by TWAREN phase 2.
1212
ISSUES
1313
Uncertainty of SNMP implementation
There are some different implementations of the SNMP TRAP/MIB among equipments of same brand.
The SNMP OIDs or the return values may vary between OS upgrade on the same equipment and are usually hard to reveal beforehand.
Therefore, the system must be designed in a way such that these changes can be accommodated with minimal modifications.
1414
The lack of skillful programmers
Our programmers are the same guys with the members of operating team.
We are not professional programmers and have not accordant programming language.
The system must be partially available and operational during the early phase of its development such that it can evolve along with the real needs.
So, an unified standard of communication between different modules is necessary
1515
Huge historical data and computing
For minimizing the false positive and false negative rate, baseline thresholds would have much better quality when they are dynamically generated from historical data.
Therefore, we need to store sufficiently large historical data sets and to have very high efficiency to retrieve the data back while calculating those thresholds.
1616
Automatically determine affected services and customs
TWAREN phase 2 inherently has the ability to guard against a single point of hardware or circuit failure, so the failure is less likely to affect the actual service provisioning.
An intelligent management system which is able to determine the scope of failure affected service will reduce the management cost.
1717
DESIGN
1818
1st Stage System Architecture
Current StatusDB
Long TermDB
Monitor Objs
Data CollectorsTraps
MIBs
Syslogs
Net flows
Telnet/SSH
Fault Detection
ThresholdDB
Case/ActionDB
GUI &Ticket System
Threshold Analyzer
Fault Location
Auto Action
Control API
Report System
TL1
Mirror
Interactive
Passive
1919
Relationship of Data Tables
Component
People
Location
Unit
Vendor
…., etc
Basic Data Tables
Circuit
VLAN Services
VPLS Services
ONSLight Path
ONSCross Connection
…., etc
Relationship Tables
2020
Basic Data Tables
Component_ID Parent_C_ID Name
1 0 TN7609P
12 1 Slot_1
2 0 TP15454
16 2 Slot_3
135 12 Port_9
ID Name Phone Address Service_Time Service_WeekDay
1 John 0939123123 xxxxxxx 8-17 1,3,5
2 Mary 0958123123 xxxxxxx ALL ALL
People Data Table
ID Name Address
1 MOEcc xxxxx
2 NTU xxxxx
Location Data Table
Component Data Table
ID Name
1 NCKU
18 THU
Unit Data Table
ID Name
1 CHT
2 APBT
3 RingLine
Vendor Data Table
2121
Relationship Data Tables
ID Name Vendor Identify From_CID To_CID Bandwidth
1 Taipei_Tainan_STM64 1 8D543267 13 35 STM64
2 NCHU_NCNU_10GE 2 ST16987 23 67 10GE
Circuit Data Table
NodeA NodeB PortA PortB
12 45 1467 2346
16 32 2312 3421
ONS Topology Link Table
CRS PortA PortB SNCP_CRS ChannelA ChannelB Size
482 1744 1756 0 5 13 4
21 3321 3343 24 17 33 16
24 3546 4534 21 1 17 16
ONS Cross Connection Table
LP PortFrom PortTo SNCP_LP CRS_Trace Size
2 2312 2345 0 359,556,522,475 4
98 3434 4455 99 482,541,335 16
99 3434 4455 98 482,469,541,335 16
ONS Light Path Table
2222
IMPLEMENTATION
2323
Current monitor objects Trap monitor
Used interfaces, BGP, etc. Environment of equipment room
Temperature (auto threshold), Voltage Statuses of equipments
Temperature , CPU, RAM, FANs, Power-Supply BGP peering with other networks
Statuses, Number of exchanged routes (auto threshold), Utilization analysis Performance monitor
End to End RTT (auto threshold), End to End Packet Lost Rate (auto threshold), End to End Availability
Throughput Backbone (auto threshold), Designate interfaces
Top N Bytes, Flows, Packets
Routes monitor The routes of customs (exact comparison)
VPLS VPN Throughput of CE side, MACs of VPN
Optical Network Current topology of lightpaths
VLAN Current topology of VLAN
2424
Future worksCombine all developed monitor objects
with single integrated visual user interface.
Enhance the monitoring of optical, VPLS and VLAN networks.
Automatically determine the fault location, root cause and affected scope.
Minimize the false positive and false negative rate.