CAUSES OF FAILURE IN WEB APPLICATIONS Feroz Zahid Simula Research Laboratoy & UiO

CAUSES OFF AILURE INW EBA PPLICATIONS · Summary!of!Findings! • Failure!Types! • SoMware!Failure!and!Human!Errors!make!80%!of!the!total! failures! • Causes!of!SoMware!Failures!

  • Upload

  • View

  • Download

Embed Size (px)

Citation preview

Page 1: CAUSES OFF AILURE INW EBA PPLICATIONS · Summary!of!Findings! • Failure!Types! • SoMware!Failure!and!Human!Errors!make!80%!of!the!total! failures! • Causes!of!SoMware!Failures!


Feroz  Zahid  Simula  Research  Laboratoy  &  UiO  

Page 2: CAUSES OFF AILURE INW EBA PPLICATIONS · Summary!of!Findings! • Failure!Types! • SoMware!Failure!and!Human!Errors!make!80%!of!the!total! failures! • Causes!of!SoMware!Failures!

Report  Details  

Authors:  Soila  Portet  and  Priya  Narasimhan    Published  by:    Parallel  Data  Laboratory,  Carnegie  Melon  University      Year  of  PublicaCon:  2005    Type  of  ContribuCon:  New  Analysis    Purpose:  The  report  invesCgates  causes  and  prevalence  of  failures  in  web  applicaCons  based  on  case  studies  and  actual  website  outages  data  collected  from  different  sources.    

Page 3: CAUSES OFF AILURE INW EBA PPLICATIONS · Summary!of!Findings! • Failure!Types! • SoMware!Failure!and!Human!Errors!make!80%!of!the!total! failures! • Causes!of!SoMware!Failures!

Report  Overview  

Page 4: CAUSES OFF AILURE INW EBA PPLICATIONS · Summary!of!Findings! • Failure!Types! • SoMware!Failure!and!Human!Errors!make!80%!of!the!total! failures! • Causes!of!SoMware!Failures!

Summary  of  Findings  

•  Failure  Types  •  SoMware  Failure  and  Human  Errors  make  80%  of  the  total  


•  Causes  of  SoMware  Failures  •  Maintenance,  Upgrades    •  System  overload,  Resource  exhausCon  and  complex  fault-­‐

recovery  mechanisms  

•  DownCme  •  Ranges  between  few  minutes  to  weeks  •  Fault-­‐chains  increases  downCme  •  Planned  downCme  is  about  80%  of  the  total  downCme  •  Planned  downCme  may  also  cause  unplanned  downCme  

Page 5: CAUSES OFF AILURE INW EBA PPLICATIONS · Summary!of!Findings! • Failure!Types! • SoMware!Failure!and!Human!Errors!make!80%!of!the!total! failures! • Causes!of!SoMware!Failures!

                                                                           Findings    What  are  the  causes  of  failures?  

•  SoMware  Failures  •  Human/Operator  Errors  •  Hardware/Environmental  Failures  •  Security  ViolaCons/Breaches    

Page 6: CAUSES OFF AILURE INW EBA PPLICATIONS · Summary!of!Findings! • Failure!Types! • SoMware!Failure!and!Human!Errors!make!80%!of!the!total! failures! • Causes!of!SoMware!Failures!

SoMware  Failures  

SoCware  Error  Type   Examples  

Resource  ExhausCon   Memory  leaks,  Buffer  overflows  

Logical  Errors   Corrupt  Pointers,  Race  CondiCons,  Deadlocks  

System  Overload   Flash  Crowd,  Slashdot  Effect  

Recovery  Code   Complex  Fault-­‐recovery,  Backup  restores  

Failed  SoMware  Update   Upgrade  Dependencies,  ConfiguraCon  errors  

Page 7: CAUSES OFF AILURE INW EBA PPLICATIONS · Summary!of!Findings! • Failure!Types! • SoMware!Failure!and!Human!Errors!make!80%!of!the!total! failures! • Causes!of!SoMware!Failures!

SoMware  Failure  –  Example  Incidents  

•  System  SoMware  •  PlanetLab  –  Bug  in  updated  kernal  module  –  Detected  by  

User  Reports  •  America  Online  –  Server  Upgrade  –  Intermi^ent  outages  –    

Several  weeks    •  Symantec  –  Mar  2005  -­‐  Patch  for  DNS  cache  poisoning  with  

redirecCon  vulnerability  –  A^ackers  redirected  traffic  to  malware  websites    

•  Zopewiki.org  –  Jul  2004  -­‐  Memory  leaks  –  Workaround  was  to  reboot  the  webserver  daily  –  Detected  by  performance  slowdown    

Page 8: CAUSES OFF AILURE INW EBA PPLICATIONS · Summary!of!Findings! • Failure!Types! • SoMware!Failure!and!Human!Errors!make!80%!of!the!total! failures! • Causes!of!SoMware!Failures!

SoMware  Failure  –  Example  Incidents  

•  ApplicaCon  Failures    •  Resource  ExhausGon  –  PlanetLab  –  Nodes  hang  due  to  an  

applicaCon  bug  which  exhausted  file  descriptors  •  Logical  Error  –  AOL  –  Dec  2004  –  DeacCvated  number  of  AIM  

accounts  in  regular  maintenance  cycle  –  Several  days  downCme  for  the  users  

•  Logical  Error  –  Pricing  error  on  Amazon’s  UK  site  lists  iPaq  Pocket  PC  under  $12  (regular  price:  $449)  –  2.5  hours  affected  –  Detected  by  abnormally  high  sales  volumes  

•  Site  Overload  –  Comair  airlines  –  Cancels  over  1000  flights  when  a  surge  in  crew  flight  re-­‐assignment  knocked  down  its  flight  reservaCon  system  

•  IntegraGon  –  HP  and  Compaq  implementaCons  of  SAP  soMware  –  Loss  $400  million  in  revenue  –  6  weeks  (3  weeks  planned)  

Page 9: CAUSES OFF AILURE INW EBA PPLICATIONS · Summary!of!Findings! • Failure!Types! • SoMware!Failure!and!Human!Errors!make!80%!of!the!total! failures! • Causes!of!SoMware!Failures!

SoMware  Failure  –  Example  Incidents  

•  Databases  •  Basecamphq.com  –  Feb  2005  –  DB  flagged  table  as  read-­‐only  

–  30  minutes  downCme  •  Walmart.com  –  Apr  2001  -­‐  Database  glitches  -­‐    9  hours  

downCme  •  Sony  –  June  2003  -­‐  Stars  Wars  Galaxies  Game  –  

Overwhelming  traffic  –  Intermi^ent  database  errors  for  one  day    

•  RECENTLY  -­‐  London  Airport  -­‐  Dec  2014  –  Inconsistency  -­‐    Nats  -­‐    transiCon  between  the  two  states  caused  a  failure  in  the  system  -­‐  NOT  in  PAPER      

Page 10: CAUSES OFF AILURE INW EBA PPLICATIONS · Summary!of!Findings! • Failure!Types! • SoMware!Failure!and!Human!Errors!make!80%!of!the!total! failures! • Causes!of!SoMware!Failures!

Human/Operator  Errors  

Human  Error  Type   Examples  

ConfiguraCon  Errors   Sysadmin  mistakes  

Procedural  Errors   Failure  to  backup,  typos  

Miscellaneous  Accidents   Accidently  disconnect  power  supply  

Page 11: CAUSES OFF AILURE INW EBA PPLICATIONS · Summary!of!Findings! • Failure!Types! • SoMware!Failure!and!Human!Errors!make!80%!of!the!total! failures! • Causes!of!SoMware!Failures!

Human  Errors  –  Example  Incidents  

•  ConfiguraGon  Error  –  MicrosoM  –    Incorrect  configuraCon  change  in  edge  routers  caused  MicrosoM  websites  downCme  from  several  hours  to  1  day  

•  ConfiguraGon  Error  –  MSN  –  mistakenly  marked  messages  from  Earthlink  and  RoadRunner  as  spam  –  Operator  error  

•  Procedural  Error  –  Gforge3  –  Failure  to  restart  database  daemon  aMer  applying  database  patch  –  Several  hours  of  downCme  

•  Miscellaneous  –  eBay  –  Electrician  accidently  knocked  out  a  plug  –  ba^ery  ran  out  30  minutes  later  –  system  outage  

Page 12: CAUSES OFF AILURE INW EBA PPLICATIONS · Summary!of!Findings! • Failure!Types! • SoMware!Failure!and!Human!Errors!make!80%!of!the!total! failures! • Causes!of!SoMware!Failures!

Hardware/Environment  Failures  

Failure  Type   Examples  

Hardware  Failures   Crashed  hard  disks,  burnt  circuits  

Environmental  Failures   Power  outages,  OverheaCng  

Page 13: CAUSES OFF AILURE INW EBA PPLICATIONS · Summary!of!Findings! • Failure!Types! • SoMware!Failure!and!Human!Errors!make!80%!of!the!total! failures! • Causes!of!SoMware!Failures!

Hardware  Failures  –  Example  Incidents  

•  Equipment  Failure  –  Wall  Street  Journal  website  –  Mar  2004  -­‐  Hardware  failure  –  1  hour  downCme  

•  Equipment  Failure    –  Yahoo  Groups  –  Mar  2002  –  Hardware  problems  –  Several  hours  downCme  

•  Power  Outage  –  eBay  –  Power  outage  in  webhosCng  facility  –  3  hours  downCme  

•  Hardware  Upgrades  –  iWon  –  New  hardware  installaCon  of  $2  million  worth  –  Several  days  of  intermi^ent  failures  

Page 14: CAUSES OFF AILURE INW EBA PPLICATIONS · Summary!of!Findings! • Failure!Types! • SoMware!Failure!and!Human!Errors!make!80%!of!the!total! failures! • Causes!of!SoMware!Failures!

Network  Failures  –  Example  Incidents  

•  PlanetLab  –  Experiment  overloads  university’s  internet  connecCon  –  Detected  by  bandwidth  spikes  

•  Bank  of  America  –  Network  connecCon  slowed  banking  service  –  several  days  of  intermi^ent  outage  

•  Sprint  –  ISP  passes  bad  rouCng  informaCon  –  2  hours  of  downCme  

Page 15: CAUSES OFF AILURE INW EBA PPLICATIONS · Summary!of!Findings! • Failure!Types! • SoMware!Failure!and!Human!Errors!make!80%!of!the!total! failures! • Causes!of!SoMware!Failures!

Security  ViolaCons  

   •  Unauthorized  accesses  •  Password  Disclosures  •  Denial  of  Service  A^acks  (DoS  /  DDoS)  •  SoMware  VulnerabiliCes  •  Viruses,  Worms  

Page 16: CAUSES OFF AILURE INW EBA PPLICATIONS · Summary!of!Findings! • Failure!Types! • SoMware!Failure!and!Human!Errors!make!80%!of!the!total! failures! • Causes!of!SoMware!Failures!

Security  ViolaGons  –  Example  Incidents  

•  MicrosoM  –  Aug  2003  –  DOS  a^ack  causes  website  downCme  of  1  hour  

•  Alkamai  –  Jun  2004  –  DOS  a^ack  on  DNS  servers  caused  2  hour  downCme  for  Google,  Yahoo,  Apple  and  MicrosoM  

•  Google  –  Jul  2004  -­‐  MyDoom  worm  causes  parCal  outage  for  several  hours  

•  Verizon  –  May  2004  –  TheM  of  network  cards  caused  customers  to  lose  their  internet  access  for  one  day  

•  Many  recent  events  –  Sony  Pictures  –  Google    

Page 17: CAUSES OFF AILURE INW EBA PPLICATIONS · Summary!of!Findings! • Failure!Types! • SoMware!Failure!and!Human!Errors!make!80%!of!the!total! failures! • Causes!of!SoMware!Failures!

ManifestaCon  of  Errors  Type   Examples  

ParCal  or  EnCre  Website  Unavailable  

File  not  found,  Web  server  crashed  

Systems  ExcepCons  /  Access  ViolaCons  

RunCme  excepCons  

Incorrect  Results   Wrong  page  served,  Invalid  Cache  used  

Data  loss  or  CorrupCon   Disk  block  failures  

Performance  Slowdowns   Network  congesCon,  System  overload  

Page 18: CAUSES OFF AILURE INW EBA PPLICATIONS · Summary!of!Findings! • Failure!Types! • SoMware!Failure!and!Human!Errors!make!80%!of!the!total! failures! • Causes!of!SoMware!Failures!

Fault  Chains  

   •  Series  of  component  failures  •  Uncoupled  Fault  Chains  

•  Independent  failures  occur  one  aMer  another  •  Uncoupled  Fault  Chains  

•  Tightly-­‐coupled  Fault  Chains  •  Correlated  failures  •  For  example,  Power-­‐outage  caused  air-­‐condiConing  to  fail  •  SoMware  dependencies  

•  60%  of  the  failures  have  fault  chain  of  two  

Page 19: CAUSES OFF AILURE INW EBA PPLICATIONS · Summary!of!Findings! • Failure!Types! • SoMware!Failure!and!Human!Errors!make!80%!of!the!total! failures! • Causes!of!SoMware!Failures!

Prevalence  of  Failures  

•  89%  of  Customers  have  experienced  Issues  when  compleCng  transacCons  •  72.5%  sites  experienced  failures  in  holiday  season  

Causes  of  Site  Failures  

ApplicaCon  Failures  

Human  Error  


Page 20: CAUSES OFF AILURE INW EBA PPLICATIONS · Summary!of!Findings! • Failure!Types! • SoMware!Failure!and!Human!Errors!make!80%!of!the!total! failures! • Causes!of!SoMware!Failures!


Planned  DownCme  

Unplanned  DownCme  

Page 21: CAUSES OFF AILURE INW EBA PPLICATIONS · Summary!of!Findings! • Failure!Types! • SoMware!Failure!and!Human!Errors!make!80%!of!the!total! failures! • Causes!of!SoMware!Failures!



•  Restricted  to  Web  ApplicaCons  •  Large  websites  like  AOL,  MicrosoM,  Walmart  etc.  

EvaluaCon  –  Comprehensive?  

•  With  40  real-­‐world  test  cases  •  Not  connected  with  the  causes  of  soMware  failure  in  general  

•  Small  subset  –  evaluaCon  could  be  biased  

ApplicaCon  Domain  

Page 22: CAUSES OFF AILURE INW EBA PPLICATIONS · Summary!of!Findings! • Failure!Types! • SoMware!Failure!and!Human!Errors!make!80%!of!the!total! failures! • Causes!of!SoMware!Failures!



•  Causes  of  failure  is  related  with  web  applicaCon  type  •  For  example,  news  website  is  more  likely  to  fail  from  crowd  sourcing  than  an  online  CMS  

Types  of  Failure  -­‐  Taxonomy  

•  Four  faults  taxonomy  is  quite  primiCve  •  What  type  is  a  device  driver  failure?  –  SoMware  or  Hardware?  

Didn’t  consider  the  type  of  Web  ApplicaCons  

Page 23: CAUSES OFF AILURE INW EBA PPLICATIONS · Summary!of!Findings! • Failure!Types! • SoMware!Failure!and!Human!Errors!make!80%!of!the!total! failures! • Causes!of!SoMware!Failures!

Few  more  important  causes  of  failures..  

•  Website  not  tested  on  different  plaporms  •  e.g.  Smart  phones,  Tablets  

•  DNS  Problems  •  Bandwidth  –  Webhost  decides  to  put  you  off  because  you  consumed  too  much  bandwidth  

•  Police  raid  –  What  happed  with  the  pirate  bay  J  



Page 24: CAUSES OFF AILURE INW EBA PPLICATIONS · Summary!of!Findings! • Failure!Types! • SoMware!Failure!and!Human!Errors!make!80%!of!the!total! failures! • Causes!of!SoMware!Failures!

                                                                               Web  applicaCon  failures  can  be  generally  backtracked  to  the  development  phase    

•  Lack  of  Well-­‐defined  scope  •  Lack  to  professional  project  management  •  Poor  version  control  •  Trying  to  reinvent  the  wheel  •  No  funcConal  tesCng  •  “Freelance  Syndrome”  J  


Some  Thoughts  

Page 25: CAUSES OFF AILURE INW EBA PPLICATIONS · Summary!of!Findings! • Failure!Types! • SoMware!Failure!and!Human!Errors!make!80%!of!the!total! failures! • Causes!of!SoMware!Failures!


•  Website  Failures  are  prevalent  •  Loss  of  revenue  •  Long-­‐term  losses  like  Customer  DissaCsfacCon    

•  Important  Study  •  Failure  Taxonomy  •  Causes  •  General  Causes    •  Real-­‐world  case  studies  

•  Can  be  improved,  extended  and  updated!