Upload
opal-oliver
View
223
Download
0
Tags:
Embed Size (px)
Citation preview
Retail Transaction Processing
Year End ReviewandRecent Issues
RMSJanuary 2007
2006 Net Service Availability
99.80%
99.60%
99.40%
98.80%
99.00%
99.20%
98.60%
98.40%
100.0%
2006 Net Service Availability Through December 31st, 2006
Transaction Processing Web Services Wholesale Total Frequency Control Realtime Balancing Market
99.27% 100%99.87% 99.95%99.96%
SLA Target 99.25%
December 2006 Net Service Availability
99.50%
99.00%
98.50%
97.00%
97.50%
98.00%
96.50%
96.00%
100.0%
December 2006 Net Service Availability
Transaction Processing Web Services Wholesale Total Frequency Control Realtime Balancing Market
100% 99.97%100%97.36% 100%
SLA Target 99.25%
December 2006 Outage Analysis – Breakdown Not Complete
Retail Transaction Processing Availability
99.50%
99.00%
98.50%
97.00%
97.50%
98.00%
96.50%
96.00%
100.0%
December 2006 Retail Transaction Processing Availability Summary
Target – 99.25%
December 2006 Transaction Processing Availability – 97.36%
Retail Transaction Processing Service Availability
• Workshop to be scheduled for February or March depending on availability of space and schedules
• To be discussed:– Raising service availability target to 99.9%– Effective date of the availability target increase– Addition of service degradation metrics & reporting– Market participant input
• Changes will be made and presented to RMS for approval
December Transaction Processing Issues
• 867 Transaction Processing Issues (multiple occurrences from 12/15 – 12/27)– Transactions completed ANSI compliance checks but failed to complete TX Set checks
• Market Participant Impact– Impacted 867 transactions were not forwarded– Potential delay of completion of service orders– Potential delay of MP invoicing– Delayed usage loading to Lodestar could potentially impact initial settlements– 1 to 7 day delay in reprocessing, majority reprocessed within 2 days
• Root Cause– PaperFree file server failures, current analysis pointing to same root cause as duplicates issue
• Solution– Architecture change tested, attempted phase one migration to production on 1/8 but rolled back due to
problems with implementation, planned migration to production on Sunday, January 14th
• Market Notices– 12/18 - 4:53 pm - Retail Transaction Processing - 867 Transactions– 12/28 - 4:37 pm - Update: Retail Transaction Processing - 867 Transactions– 12/29 - 2:35 pm - Update: Retail Transaction Processing – 867 Transactions– 01/02 - 4:51 pm - Update: Retail Transaction Processing – 867 Transactions
December Transaction Processing Issues
• NAESB Outage (12/5)– Outage attributed to PaperFree file server failures, current analysis pointing to same root
cause as duplicates issue and 867 processing issue
– Market notice sent 12/5
– Attempted fix to be implemented on January 14th
• RBP Stabilization Code Releases (12/14) – Controlled outage– Multiple outages while code fixes were migrated into production following RBP
implementation
– Market notice sent 12/14
• TIBCO Database outage (12/14)– Partitioning error occurred in the PaperFree to TIBCO database
– Market notice sent 12/14
– Fix complete
• TIBCO Adapter (12/21)– Following an emergency TIBCO migration to fix a customer care process issue, TIBCO
adapters were not turned back on, training issue and learning curve with TIBCO software
– Market notice sent 12/22
– Fix complete
January Transaction Processing Issues
• Database Indexing (1/2)– Added Q1 2007 partition to incorrect table space with insufficient space available
– Inbound transactions were held while database partition was pointed to correct table space and tables were re-indexed
– Fix complete
– Market notices sent 1/2 – 1/4
• Siebel Batch (1/3)– Service order without ESI_ID caused batch to hold, record was manually skipped to allow
batch processing to continue
– Root cause unknown, previously unknown problem
– SIR written to allow batch processing to continue if encountered in the future
– Market notice sent 1/4
• PaperFree and NAESB Servers Memory Failure (1/3) – Controlled Outage– Memory failure occurred and the cluster failed over as designed. During replacement of the
failed hardware, the cluster did not recognize the new hardware and the replacement required a total outage to reconfigure the cluster.
– Fix complete - hardware replaced in approximately two hours
– Market notice sent 1/4
January Transaction Processing Issues
• Siebel to Lodestar Batch (1/3)– ‘ESIID service history’ table partition was split to allow data for Q1 2007 data to populate the
database, the table partition split should have worked as performed, however a bug in Oracle 9.2.04 caused a problem
– Market notice not sent because problem was fixed before market was impacted
– Fix complete
• NAESB to PaperFree Communication Failures (1/5)– PaperFree file server having difficulties pulling data from the NAESB server
– Still under investigation, analysis not complete
– Market notice sent 1/5
Architecture Change & Attempted Fix
• Analysis and troubleshooting of duplicates, PaperFree file server, and 867 forwarding problems has pointed to a potential cause
• Communication protocol used to communicate between PaperFree file servers and
PaperFree processor servers drops connections randomly and intermittently, and the PaperFree application experiences multiple problems when this occurs
• Implemented in phases, this architecture change would remove the need for this communication protocol in the retail environment
– Phase One – January 14th– Phase Two – 7 to 14 days after phase one
• Key points:– Phased approach to ensure the change is effective and to eliminate multiple simultaneous
changes
– Following the change, redundancy would still be in place for the retail environment
Architecture Change & Attempted Fix
PF File Server
PF File Server
PaperFree Process Servers
Failover Server
PF File & Process Server
Phase One
Communication Protocol Problems
Phase Two
PF File & Process Server
PF File & Process Server
Clustered
Process Servers
Clustered
Retail Transaction Processing
Questions?