Upload
mohammed-aatif
View
225
Download
0
Embed Size (px)
Citation preview
8/6/2019 P200
http://slidepdf.com/reader/full/p200 1/14
BugFix
1
EWSD
Reliability
Contents Telecommunication systems have to work 24 hours a
day, 365 days a year, with a negligible downtime.
Therefore they have to have an extremely high level of
availability/reliability.
EWSD has been designed with stringent requirements
for reliability and quality of service (QoS).
Reliability concept
Reliability analysis
Component reliability
Spare parts
Hardware reliability analysis
Software reliability estimation
Reliability data
Total system downtime
Unscheduledmaintenance
Scheduled
maintenance
Design
decisions
(redundancy)
Stresstests
Function
tests
"Rough" analysis
Quality
Information
System
Concept & design design & development Test Field
R e l i a b i l i t y
e.g.
Performance
Design
development
Maintainability
tests
Repair
Data
collection
Requirements
R e l i a b i l i t y
e.g.
Performance
Operation &
maintenance
"detailed"
Reliability
analysis
- reliability
- availability
- maintainability
Reliability
evaluation
Reliability
evaluationReliability
evaluation
Reliability
evaluation
System
specification
Function &
system test
Design review
Verification of
parameters &
data collection
8/6/2019 P200
http://slidepdf.com/reader/full/p200 2/14
2
Reliability assurance program
Questions of reliability play an important role in the de-
velopment. A reliability assurance program was put in
place in order to achieve and maintain such a high level
of reliability. This program ensures the quality and reli-
ability by continuous monitoring from the beginning of
development through modifications and during field op-
eration.
Rough reliability analyses are performed in the concept
and design phase to verify the system design and to
support system design decisions. During design and
development, detailed reliability analyses are per-
formed in order to verify the system design, to optimize
the maintenance strategies, and to provide customers
with reliability models and results.
During the test phase reliability data are collected and
reliability-related tests are performed. These tests are
used to check the related functions and to verify the
model parameters used. A Quality Information System(QUIS) is established in order to monitor, measure and
understand the reliability and quality performance in
operation. For this purpose, quality and reliability data
are collected from systems worldwide and evaluated
continuously.
EWSD network nodes are in service in more than
100 countries, e.g. USA, Brazil, China, Indonesia,
Egypt, South Africa, Portugal, Germany, and
300 telephone administrations. EWSD is well-known
for its high reliability and high availability all over the
world. In the USA, for example, EWSD is the network
node with the best total downtime performance based
on 1998 FCC ARMIS data.
Reliability objectives
Reliability measurements are defined to provide a high
standard of service reliability. The reliability measure-
ments and their objectives consider two aspects:
– the subscriber's/user’s point of view requires a high
reliability and availability of services
– the service provider's point of view is to carry out
limited maintenance and repair work
Major reliability measurements are, for example, the to-
tal system downtime, the line or trunk downtime, the
premature release probability, the incorrect charging
probability, but also the number of maintenance ac-
tions, or the circuit pack return rate. The different reli-
ability measurements are calculated theoretically by
means of reliability modeling and they are recorded and
evaluated in operation as well. Typical in-service objec-
tives:
The theoretically calculated reliability measurements
are basically used to evaluate the expected reliability of
the system and to give the service provider an idea of
reliability. We have defined essential requirements that
cover most of the known requirements from Telcor-
dia/Bellcore, ITU, and specific customers.
Reliability conceptThe reliability concept comprises:
• Hardware
• Software
Hardware
In designing the system, particular attention was paid to
achieving the highest possible reliability and availability
of the system. This is achieved by full redundancy of all
central hardware components of the system.
The coordination processor (CP) works as an n+1 re-
dundant multiprocessor and all other central units areduplicated:
– Common memory (CMY)
– Input/output processors (IOP)
– Disks
– Message buffer (MB)
– Switching network (SN)
– Signaling system network control (SSNC)
– Central clock generator (CCG)
The digital line unit (DLU) is internally duplicated and
operates according to the loadsharing principle.
In-service performance
(all causes of failure)
Total system downtime 3 minutes per year
Single termination downtime 30 minutes per year
SS7 link downtime 82 minutes per year
Premature release probability 2 x 10-5
Repair actions (hardware) 5 per 1,000 ports & year
8/6/2019 P200
http://slidepdf.com/reader/full/p200 3/14
3
All units are monitored by safeguarding programs to
ensure that errors are dealt with immediately without
impairing operation of the system. Hardware faults are
detected and localized automatically. When a faulty
unit has been localized, it is disconnected and an alarm
is sent. The standby unit of redundant equipment is put
into service by fast service recovery functions.
Coordination processor 113E (CP113E)
The CP113E is a multiprocessor system. All critical
units are duplicated to improve system availability. This
ensures that an outage on only one unit (a single error)
will never cause outage of the CP113E. To achieve
this, the faulty component is immediately localized and
replaced by a redundant component as soon as it has
been isolated. In many cases, outage of more than one
component can also be tolerated without causing out-
age of the CP. For example, the following devices are
duplicated:
– Base processor (BAP)
– Input/output control (IOC)
– Common memory (CMY)
– ATM bridge processor (AMP)
Pool redundancy guarantees sufficient availability of
the call processors (CAP). Critical peripheral devices
(e.g. magnetic disk device, MDD) are also duplicated.
They are connected to different IOCs by different in-put/output processors (IOP). In the same manner, the
periphery is connected to the CP by one or more pairs
of IOPs for the message buffer (MB).
All BAPs, CAPs, IOCs and AMPs are connected to the
two CMYs.
One of the two BAPs is BAP-master and the other is
BAP-spare. If the BAP-master becomes unavailable,
the current BAP-spare automatically becomes BAP-
master. The BAP-spare usually functions as a CAP and
is part of the n+1 pool redundancy of the CAPs. IOC
and IOP pairs work in load-sharing mode. If one com-ponent of these pairs becomes unavailable, its function
is performed by the remaining redundant component
without any effect on service. Two AMPs are provided
as ATM bridge processors (AMPC) to the SSNC via op-
HTI
LTGDLU
RSU
CP
CCG
MB
NetManager
LTG
SN
SSNC
Active/
Standby
Active/Standby-redundancy
Master/slave redundancy
Pool redundancy (n+1)
BAP1
slaveCAP0 CAPn
BAP0
master
CP113E redundancy architecture
AMP0 AMP1
IOPs
IOP:MB7
Load sharing redundancy
IOP:MB0
IOPs
IOP:MB7
IOP:MB0
CMY0 CMY1
IOC0 IOC1
B:CMY0B:CMY1
SSNC
0 1 0 1
8/6/2019 P200
http://slidepdf.com/reader/full/p200 4/14
4
tical fiber interfaces. They work in active/standby re-
dundancy. Central parts of the processor boards and
the CMY are internal duplicated in order to detect fail-
ures safely and immediately.
Message buffer (MB) and switching network (SN)
The message buffer (MB) is fully duplicated. The chan-
nels are through- connected on both switching network
(SN) sides by the associated message buffers MB0and MB1. As regards the control information to the SN,
both MBs operate in active/active mode. The message
channels to the LTGs are connected via both SN
halves. Although each MB can process the entire data
flow to the LTGs, the MBs operate in load-sharing
mode. If one MB becomes unavailable, the MB that is
still active takes over all the traffic to the LTGs.
The SN is also fully duplicated. One SN half is active
and the other is on hot-standby. All connections are set
up in parallel and on the same path in both SN halves.
If an error occurs in the active SN half, the CP initiates
changeover to the standby SN half.
Digital line unit (DLU) and line/trunk group (LTG)
The DLU provides the interface for all subscriber lines
to the EWSD system. To meet a high standard of reli-
ability, all central parts of the DLU (DLUC) are duplicat-
ed except the subscriber line modules (SLM) and
circuits used for external alarms and line testing.
Both DLUCs operate in load-sharing mode. If oneDLUC or a connected LTG fails, calls established via
the failed DLUC are lost and the call handling capacity
is reduced. The subscribers have to re-establish their
calls, which are now routed via the DLUC that is still in
operation.
Because of the high reliability of the LTG hardware the
reliability requirements for trunk terminations can be
met without redundancy. Nevertheless it is advisable to
distribute the trunks of a trunk group over several LTGs
in order to provide maximum service availability.
Active/active load-sharing redundancy
Active/active load-sharing redundancy
SN0
SNMAT
IOP:MB IOP:MB
SN1
SNMATSNMUX
0
SNMUX
0
SNMUX
15
SNMUX
15
Redundancy structure
SN/MB
MB0
LTG
SSNC SSNC
LTG
MB0
IOP:MB IOP:MB
Active/hot-standby redundancy
8/6/2019 P200
http://slidepdf.com/reader/full/p200 5/14
5
Signaling system network control (SSNC)
The redundant structure of the SSNC ensures that no
single failure in the central parts will cause a loss of sig-
naling traffic.
The SSNC consists of two duplicated active/active
ATM switching planes ASN0 and ASN1.
Connected to the ATM switching network (ASN), and
communicating via it, are:
– Main processor for administration and maintenance
(MP:OAM)
A duplicated MP:OAM is running in micro synchro-
nous active/hot-standby redundancy with
2 redundant magnetic disk devices, MDD0 and
MDD1 (active/active redundancy),
a magneto-optical disk, MOD (not redundant),
a local alarm interface (ALI) and a LAN interface.
– Main processor for signaling manager (MP:SM)A duplicated MP:SM for SS7 signaling manager is
running in micro synchronous active/hot-standby
redundancy.
– Main processor for signaling link termination
(MP:SLT)
Several duplicated MP:SLTs are running in micro
synchronous active/hot-standby redundancy.
– Main processor for global title translation (MP:GTT)
Several duplicated MP:GTTs are running in micro
synchronous active/hot-standby redundancy.
– Main processor for number portability (MP:NP)
Several duplicated MP:NPs are running in micro
synchronous active/hot-standby redundancy.
– Main processor for IP interface (MP:IP)
Several duplicated MP:IP are running in micro syn-
chronous active/hot-standby redundancy.
– Up to 2 cross-linked fiber optics to the MBD
– One cross-linked fiber optics to the CP (AMP)
– Line interface card (LIC)
Several pairs of E1 LICs working in active/standby
redundancy.
Non-redundant
Load-sharing redundancy
DLU
Trunks
DLUC0SLMs
SLMs
SLMs
DLUC0
LTG
LTG
LTG
LTG
SN1
SN0
Subscriber
lines
SSNC
LICMP:SLT
LICMP:SM
LIC
LIC
SS7
linksLIC
LIC
LICMP:OAM
LICAMP MBD
LICMP:GTT
LICMP:NP
LICMP:IP
ASN0
8/6/2019 P200
http://slidepdf.com/reader/full/p200 6/14
6
The MP:OAM is the central operation, administration
and maintenance (OAM) processor of the system with
redundant active/active MDDs for software versions
and semi-permanent data. An MOD is provided without
redundancy for upgrade purposes and for storing snap-
shots of the signaling system network control (SSNC)
system.
The alarm interface module (ALI) shows up to
16 external alarms locally. The signaling system No. 7
(SS7) is managed via the MP:SM. Several MP:SLTs,
MP:NPs, MP:GTTs are provided. They communicate
directly via the optical fiber interfaces via the message
buffer (MB) with LTGs connected to DLUs for SEP traf-
fic.
Clock distribution
The central clock generator (CCG) generates the clock
for the EWSD system, synchronizes it to the externally
applied reference frequencies, and distributes it to the
subsequent equipment.
The clock pulse is distributed in four levels:
I. MB:GCG,
II. SN:GCG,III. LTG:GCG,
IV. DLU:GCG
Each level consists of one or more parallel clock gener-
ators which receive the clock of the next higher level as
a reference clock.
The CCG is duplicated. One CCG is active and the oth-
er CCG is on standby and in synchronism with the clock
of the active CCG. Each CCG is supplied with two ex-
ternal reference clocks. If the active CCG or both refer-
ence clocks fail, changeover to the standby CCG takes
place immediately without loss of synchronization.The SSNC has a separate clock distribution system
which is also completely redundant. The inputs to it are
connected to the primary CCG.
CCG1
CCG0DLULTG
LTG
LTG
MB1
MB0
SSNC
ASN0
ASN1
Active / Standby
redundancy
SN0
SN1
ACCG ACCG
ACCG ACCGACCG
LIC0
LIC1
MPU0
MPU1
ACCG
GCG0
GCG1
CP0CP1
GCG
GCG
GCG
GCG
GCG
GCG
GCG
R1
R4
R3
R2
8/6/2019 P200
http://slidepdf.com/reader/full/p200 7/14
7
Software
In terms of adaptability, system modification and sys-
tem expansion, the EWSD software is modular in struc-
ture and functional in organization.
An essential quality attribute for the software is the soft-
ware reliability. Software reliability comprises the fol-
lowing aspects:
– technical correctness, completeness – consistency, integrity, error prevention
– protection against failure, minimization of error
propagation
– error neutralization mechanism (recovery)
– analysis and correction of software errors
– robustness against overload
Technical correctness and completeness are achieved
by means of inspections, reviews and tests. Thus the
whole development process from system definition to
field operation is controlled by a quality management
system (ISO 9000).Consistency, integrity, and error prevention are
achieved, for example by the following measures:
– file protection against unauthorized access
– periodic consistency checks for data
– special security measures to prevent multiple ac-
cess to data
– special security measures for data modification
– validity and consistency checks for data transferred
at the interface
– checksum procedures for monitoring data and pro-
gram code
– corrective audits of critical data
Protection against failure and minimization of error
propagation are achieved by the following measures,
for example:
– division of program code and data into separate link
modules, which are likewise stored in separate
memory areas
– memory protection for program code and for semi-
permanent data
– duplication of system files and user files – monitoring the real-time response of programs
– monitoring system performance
The aim of recovery is to neutralize an error in such a
way that switching operation is either not impaired at all
or only slightly. Central and peripheral recovery are di-
vided up into recovery levels for this purpose. The indi-
vidual recovery levels initiate specific recovery actions,
which quickly and effectively restore the system to ser-
vice.
CP
recovery
level
System function Customer effect
CP Periphery
New start Initialization of
- processes
- stack
- process data
- heap
- Calls and connections maintained;
- Some calls in setup released;
- Call-charge data retained
Initial start System reset;
CP load from disk
LTGs loaded - Switched connection released;
- Nailed-up connections re-established;
- Call-charge data retained
Basic
operation
Initialization of vital processes, stack, data;
non-essential processes inhibited;
non-essential OA&M functions inhibited
Depends on recovery level initiating basic
operation (New start or initial start)
Initial start with
last generation
Reload lastgeneration
from disk
LTGs loaded - Switched connections released;
- Nailed-up connections re-established;
- Call-charge data restored
8/6/2019 P200
http://slidepdf.com/reader/full/p200 8/14
8
Most of the software errors occurring on CP or MPs are
neutralized by the affected process itself if possible.
The few remaining software errors are neutralized by
the first recovery level (new start of all processes not
relevant for call processing on CP).
To ensure that the called recovery level clears an error
completely
– the system supervises the run time of the recovery
– checks whether further errors occur while recovery
is running, and
– checks whether further errors occur again after re-covery within a supervision time.
If one of the checks indicates that the called recovery
level was not successful, the next-higher recovery level
is started. If the lowest level of Initial Start recovery is
not successful either, basic call processing is started.
This system status known as "basic operation" allows a
reduced process set to be activated, which guarantees
that the basic call processing functions are maintained.
In this way, errors in areas of software that are not con-
cerned with call processing are masked out.
Fault symptom files for problem analysis and a remote-controlled software correction system are used for
analysis and correction of software errors.
Robustness against overloads is achieved by means of
an overload protection procedure. The overload protec-
tion procedures use a step-by-step load rejection strat-
egy. The procedure is designed so that it can
differentiate between short-term load peaks, which may
be tolerated without any overload protection measure,
and long term overloads.
Reliability analysis
The reliability analysis comprises: – Component reliability
– Spare parts
– Hardware reliability analysis
– Software reliability estimation
Component reliability
Overall component reliability is based on the reliability
of the various items of hardware (resistors, capacitors,
ICs, etc.). The failure rates of the components used are
calculated on the basis of Siemens norm SN29500.
SN29500 contains failure rates of the components forreference conditions and methods of considering de-
pendencies of the failure rates at operating conditions.
SN29500 complies with IEC 1709 ”Electronic Compo-
nents – Reliability Reference conditions for failure rates
and stress models for conversion”. The basis for
SN29500 is the worldwide field experience in Siemens
products, detailed service and repair statistics, compo-
nents test etc.
The mean failure rate of circuit boards is in the range of
2,000 to 6,000 FIT (failure in 109 hours) or an MTBF of
60 to 20 years.
SSNC
recovery
level
System function Customer effect
CP Periphery
Local recovery of
an MP platform
(FULLREC, code
only)
Initialization of
- processes
- stack
- process data
- heap
- Any failed MP:SM links are restored;
- Nailed-up connections re-established;
Local recovery of
an MP platform
(code & data)
Platform reset;
MP load from disk
LIC or ACCG is loaded
(code & data)
- Any failed MP:SM links are restored;
- Nailed-up connections re-established
Basic
operation
Initialization of vital processes, stack, data;
Non-essential processes inhibited;
Non-essential OA&M functions inhibited
Depends on recovery level initiating basic
operation (local recovery only with code or
with code and data)
System-wide
ecovery
(LOADREC2)
System reset;
MP:SA load from disk
All ACCGs and LICs are
loaded (code & data)
- All links are interrupted and restored;
- Nailed-up connections re-established.
Initial start with
last generation
Reload lastgeneration
from disk
All ACCGs and LICs are
loaded (code & data)
- All links are interrupted and restored;
- Nailed-up connections re-established.
8/6/2019 P200
http://slidepdf.com/reader/full/p200 9/14
9
Spare parts
Network nodes are systems in which component fail-
ures and thus also device failures can be expected and
which therefore require corrective maintenance. This
gives rise to a certain demand for spare parts and the
need to maintain stocks of such parts (in this case
spare modules) either by repairing faulty modules or by
ordering new ones from time to time from the manufac-
turer.
The failure rates of the individual modules and the num-
ber of modules installed can be used to calculate the
probability of a certain number of module failures oc-
curring within a particular period.
The cumulative poisson distribution is used for calculat-
ing the required number of spare modules. Essential
customer-specific parameters for this calculation are
the required service continuity probability and the turn-
around time, which is defined as the interval between
the time when a replacement is ordered and the time
when the replacement is received. The spare parts re-
quirements are calculated individually for each project.
Hardware reliability analysis
Reliability analysis and modeling are an integral part of
the development process. Reliability block diagrams
and state transition diagrams are used for hardware re-
liability modeling. The models consider all aspects of
the system that affect its reliability, for example the abil-
ity of the system to detect faults, the ability to identify a
faulty unit and isolate it, or the frequency of periodic di-
agnosis. Hardware failure rates of all components are
predicted at an early stage of the development process.
All predictions are based on the Siemens normSN29500 for component failure rate calculations.
The reliability block diagram shows the simplified reli-
ability block diagram relating to the total system down-
time. Reliability block diagrams of this kind are created
for each specified reliability measurement such as total
system downtime, single termination outage, or SS7
link downtime.
5000
FIT
4000
FIT
3000
FIT
35
30
25
20
15
10
5
0 R e q u i r e d s p a r e m o d u l e s
50001000
1002000
FIT
1000
FIT
F ai l ur e r at e o f mo d ul e n u m
b e r o f m
o d u l e
s
99.9% service continuity prob.
turnaround time: 1 month
SN/MBCP113E incl. clock
CCG
CCG
BAP
BAP
CMY
CMY
AMP
AMP
IOC/IOP
IOC/IOP
SSNC
SSNC
SSNC
SN
SN
MB
MB
8/6/2019 P200
http://slidepdf.com/reader/full/p200 10/14
10
The individual subsystems are modeled by Markov
modeling techniques. The reliability model of the sub-
systems (Markov model) shows all failure, detection,
recovery and repair actions relevant for the reliability of
the subsystem. Examples are:
– failure in both sides of the system
– uncovered faults
– failure on the active system side during repair of the
other side – non-redundant operation of the system due to
periodic automatic diagnostics of the redundant
side and the effect of a failure during this period
The reliability analysis finishes with the computation of
the defined reliability measurements and verification of
the measurements with the requirements. Additionally,
the effect of the choice of parameter value on the re-
sulting reliability measurement is analyzed in order to
determine the optimized system parameters. Examples
are the minimum necessary fault detection probability
or the optimized period for periodic diagnosis of redun-
dant components.
During system integration testing, dedicated test steps
are used to verify the reliability structure and the cor-
rectness of the reliability models. For this purpose,
hardware faults are inserted to study how the system
behaves if a hardware fault occurs. Parameters used in
the hardware reliability models, such as switchover per-
formance or fault detection probability, are evaluated
statistically.
(1-d) λc
undetected
failure on the
standby
unit
µroutine µdia
SIMPLEX
on-site
repair
3
(d+c)λc
failure on
one of the
redundant
units
DOWN
travel
to the site
4
2 µdia
2 µroutine
µtravel
DOWN
up to endof diag.
10
Routing
diagnosis
switchover
9
SIMPLEX
travel
to the site
2
DOWN
on-site
repair
5
µrepair
DOWN
uncoveragefault
6
µremote
r (1-c)λc
µtravel µrepair
µtravel + µrepair
µrepair
2 λm
λc
µrepair µrepair
(1-r)(1-c)λc
λc
λm
µrepair
µtravel
µroutine
failure rate of common parts
failure rate of minor faults
repair rate
travel rate Z
routing rate
λc λc λc
SIMPLEX
undetected
fault
8
1
Normal operation of redundant units
λc
µdia
c
d
r
diagnosis rate
coverage probability
detection probability
remote repair probability
SIMPLEX
on-site
repair
11
DOWN
uncoveragefault
7
µtravel
Example of a typical Markov model for a redundant system
DOWN
2nd failure
during repair
12
8/6/2019 P200
http://slidepdf.com/reader/full/p200 11/14
11
Software reliability estimation
Reliability of software used in telecommunication net-
works is a crucial determinant of network reliability.
Software reliability estimation is an important element
of software reliability management. In particular, soft-
ware reliability estimations guide the system testing
process and decisions on release of software.
Software errors are errors of logic, not of equipment - itis therefore possible to achieve 100 % reliability for
small programs. The average size of program modules
in the software does not as a rule exceed 1,000 state-
ments - they can therefore be regarded as small mod-
ules.
Software reliability increases with testing time as error
corrections are made in response to failures. During the
test phase, weekly progress reports on error finding
and error fixing activities are provided. These reports
are used as measurements of software quality during
the test phase. The measurements are compared with
defined software quality objectives at given milestones.
Software reliability models, which assume that the cu-
mulative number of software errors increases exponen-
tially to an asymptotic value, are applied to evaluate the
software reliability and quality. With the aid of the soft-
ware reliability model it is possible to estimate the num-
ber of errors in a software product and to estimate the
testing time and testing resources required to reach a
predefined quality level.
On the other hand, a software error is defined as a de-
parture of program operation from the specification,
caused by a software problem. Because of the different
characteristics and effects of software errors, and be-
cause of the error-tolerant software architecture, only
an infinitely small proportion of software errors affect
the reliability of the system.
The downtime due to software errors is basically de-
pendent on the frequency and duration of the recovery
level affecting the service capability of the switch.
For the estimation of the software reliability of a new
system version, detailed recovery statistics from ver-
sions worldwide in the field, recovery statistics from the
test bed, and recovery runtime estimations and mea-
surements are used.
Worldwide statistics show that the share of software-re-
lated failure in the total system downtime is approxi-
mately 1 min/year on average.
fixed failure
(cumulative)
0
0
test time
Qualitätskriterien für Meilen-
stein B600: kein Vorkommen
nicht korrigierter Prio 1-Fehler
still unfixed failure
fixed Prio1 failure
(cumulative)
still unfixed
Prio1 failure
test time
failure intensity
additional test time required
Cumulative failures
total failures
found to
date
test time
*)
*)
*)
test time
failure intensity objective
8/6/2019 P200
http://slidepdf.com/reader/full/p200 12/14
12
Reliability data
Total system downtime
The total system downtime amounts on average to less
than 1 hour in 20 years (3 min/year) for hardware and
software failures. This corresponds to an overall sys-
tem availability of 99.9994 percent-plus.
Hardware: The MTBF of the system due to hardwarefaults has been calculated at more than 600 years. The
mean accumulated downtime is calculated in the range
of 0.01 to 0.05 min/year, depending on the assumed re-
pair time.
Software: Field performance measurements show that
less than one software error in 3 years requires a re-
covery level affecting the service capability of the sys-
tem longer than 30 seconds. In this case the service
capability can be restored in approx. 3 minutes on av-
erage. Thus the share of software-related failure in the
total system downtime is less than 1 min/year on aver-
age.
SSNC
4 672 yearsMTBF
0.0034 min/yearDowntime
6 371 years
0.0026 min/year
1 044 years
CCG
CCG
CP113E
CP113E
SSNCMB
MB
0.0039 min/year
MTBFtotal =
1MTBFi
Σ
1= 736 years
Total System Downtime = Σ Downtimei = 0.0177 min/year
SSNC
SN
SN
33 848 years
0.0009 min/year
MTTR = 0.5 h
0 h travel time
0.5 h repair time
MTTR = 2 h
1.5 h travel time
0.5 h repair time
MTTR = 4 h
3 h travel time
1 h repair time
MTBF
years
Downtime
min/ year
MTBF
years
Downtime
min/ year
MTBF
years
Downtime
min/ yearCCG 33 848 0.0009 33 511 0.0036 33 073 0.0072
CP113E
small 5 136 0.0030 5 059 0.0052 4 941 0.0089
large 4 672 0.0034 4 600 0.0059 4 488 0.0101
SN/MB
small 3 384 0.0050 3 229 0.0093 3 035 0.0184
large 6 371 0.0026 6 209 0.0045 6 006 0.0080
SSNC
small 1 030 0.0088 1 020 0.0107 1 006 0.0155
large 1 044 0.0039 1 031 0.0059 1 013 0.0113
Total (small) 671 0.0177 659 0.0288 643 0.0500
Total (large) 736 0.0099 726 0.0163 711 0.0294
8/6/2019 P200
http://slidepdf.com/reader/full/p200 13/14
13
Trunk downtime
The unavailability encountered by an individual trunk
depends on failure of the central equipment as de-
scribed above and failures in the peripheral equipment.
The estimates show that the mean accumulated intrin-
sic downtime (MAIDT) for an individual termination will
be less than 15 minutes per year for hardware and soft-
ware faults. Thus the relevant ITU recommendation
(Q.541) requiring less than 30 minutes per year is met
comfortably.
Due to the full redundancy of all central equipment, the
unavailability of an individual termination is determined
by the non-redundant parts (LTG).
Hardware: The MTBF for an individual trunk due to
hardware faults has been calculated at more than
23 years. The mean accumulated downtime is calculat-
ed in the range of 2 to 14 min/year, depending on the
assumed repair time and depending on used LTG
(LTGN or LTGP).
Software: Field performance measurements show that
less than 0.5 software errors per year per LTG require
a recovery level causing service interruption to the
trunks directly connected, with a duration of about
2 minutes on average.
Total outage
671 years
0.018 min/year
MTBF
Downtime
1 718 years
0.011 min/year
16 years
1.872 min/year
LTG access
SN MB
SN MB
LTGP
Trunk Downtime = Σ Downtimei = 1.901 min/year
MTBFtotal =1
MTBFiΣ
1= 16 years
MTTR = 0.5 h
0 h travel time0.5 h repair time
MTTR = 2 h
1.5 h travel time0.5 h repair time
MTTR = 4 h
3 h travel time1 h repair time
MTBF
years
Downtime
min/ year
MTBF
years
Downtime
min/ year
MTBF
years
Downtime
min/ year
System large 671 0.018 659 0.029 643 0.050
SN/MB (LTG access) 1718 0.011 1384 0.032 1091 0.095
LTG
LTGP (4 x LTG func.) 16 1.872 16 6.444 16 12.886
LTGN 31 1.392 31 3.648 31 6.789
Trunk (LTGP) 16 1.901 15 6.505 15 13.031
Trunk (LTGN) 29 1.421 29 3.709 29 6.934
8/6/2019 P200
http://slidepdf.com/reader/full/p200 14/14
14
Subscriber line downtime
The unavailability encountered by an individual sub-
scriber line depends on failure of the central equipment
as described above and failures in the peripheralequip-
ment. The estimates show that the mean accumulated
intrinsic downtime (MAIDT) for an individual termination
will be less than 15 minutes per year for hardware and
software faults. Thus the relevant ITU recommendation
(Q.541) requiring less than 30 minutes per year is meet
comfortably.
Due to the full redundancy of all central equipment, the
unavailability of an individual termination is determined
by the non-redundant parts (SLM).
Hardware: The MTBF for an individual subscriber line
has been calculated at more than 5 years. The mean
accumulated downtime is calculated in the range of 3 to
13 min/year, depending on the assumed repair time.
Software: The probability that both DLU controls or theassociated LTGs will fail at the same time is negligible.
671 years
0.018 min/year
MTBF
Downtime
39 years
0.766 min/year
19 years
1.575 min/year
1 718 years
0.011 min/year
SLMA
Trunk Downtime = Σ Downtimei = 2.362 min/year
MTBFtotal =1
MTBFiΣ
1= 13 years
Total outage
LTG access
SN MB
SN MB
DLUG LTGP
DLUG LTGP
MTTR = 0.5 h
0 h travel time
0.5 h repair time
MTTR = 2 h
1.5 h travel time
0.5 h repair time
MTTR = 4 h
3 h travel time
1 h repair time
MTBF
years
Downtime
min/ year
MTBF
years
Downtime
min/ year
MTBF
years
Downtime
min/ year
System large 671 0.018 659 0.029 643 0.050
SN/MB (LTG access) 1718 0.011 1384 0.032 1091 0.095
DLUG-LTGP
(includes loadsharing failure modes)
39 0.766 39 3.066 39 6.133
SLM
SLMA (32 lines)
SLMD (16 lines)
19
14
1.575
2.078
19
14
2.912
4.686
19
14
5.823
9.372
Analog line 13 2.362 12 6.026 12 12.081
ISDN line 10 2.865 10 7.801 10 15.630
Copyright (C) Siemens AG 2001
Issued by Information and Communications Group • Hofmannstraße 51 • D-81359 München
Technical modifications possible. Technical specifications and features are binding only insofar as they are specifically and expressly agreed upon in a written contract.
Order Number: A30828-X1160-P200-1-7618 Visit our Website at: http://www.siemens.com