Best Practices to create High Load Websites

Embed Size (px)

DESCRIPTION

High performance for a Web server that receive a large numbers of requests is critical success factor for a web site, but in many cases the Web server is only “tip of the iceberg” of a very large heterogeneous systems, with lots of components and technologies. This talk present best practices to design an high availability and high performance web site. The presentation will cover load balancing, Web server acceleration, and efficient management of dynamic data, that can be adopted by any sites to improve performance and availability. We also describe common mistake implemented in the web application framework that create performance limitations and bottleneck. The presentation will describe how to define monitors metrics of the service , that are the “eyes” of operation departments, and the implementation of the “red button”

Text of Best Practices to create High Load Websites

  • 1.Best Practices to create High Load Websites!Beolink.org!

2. Agenda Beolink.org! Introduction! Design and Optimizations! Generic Optimizations! Presentation Layer! Application Layer! Data Layer! Example! Service Operation! Monitoring! Emergency Operations!2!9/5/11! 3. Introduction: The value of an ITService Beolink.org! From an ITIL perspective, the value is composed by two components: utility (tness for purpose) and warranty (tness for use.)! ! Utility is a "[f]unctionality offered by a product or service to meet a particular need. Utility is often summarized as what it does."! ! Warranty as "[a] promise or guarantee that a product or service will meet its agreed requirements" and as "derived from the positive effect of being available when needed, in sufcient capacity, and dependably in terms of continuity and security."! 3!9/5/11! 4. IntroductionBeolink.org!Do you have the new Amazon website ?! !4.3 million pages/day != 50 pages/s, I dont think so !! !1000 pages/s = 86 mil pages/day!Interesting gure! 4! 9/5/11! 5. Introduction Beolink.org! How fast is fast enough? ! 5!9/5/11! 6. Introduction: user perceptionBeolink.org!80% of the end-user response time is spent on the front-end. !!Most of this time is tied up in downloading all the components in the page:images, stylesheets, scripts, Flash, etc. Reducing the number of componentsin turn reduces the number of HTTP requests required to render the page!(see Netix case studies)! Request!Render! css loading, asset loading, javascript loading!Response!Time! 6! 9/5/11! 7. IntroductionBeolink.org! Do you know yourApplication Architecture ? ! 7! 9/5/11! 8. Introduction: Architecture [ktkt] Beolink.org! A software architecture is an abstraction of the run-time elements of a software system during some phases of its operation. A system may be composed of many levels of abstraction and many phases of operation, each with its own software architecture.! ! A software architecture is dened by a conguration of architectural elements--components, connectors, and data-- constrained in their relationships in order to achieve a desired set of architectural properties.! ! A component is an abstract unit of software instructions and internal state that provide a transformation of data via its interface.! ! A connector is an abstract mechanism that mediates communication, coordination, or cooperation among components.! ! Datum is an element of information that is transferred from a component, or received by a component, via a connector.! ! [1] Roy Filding theis! 8! 9/5/11! 9. Introduction: Web Architecture Beolink.org! Presentation Layer supports a client, the system needs to have a presentation layer throughPresentation! which the user can submit operations and obtain a result.! !Middleware! Middleware Layer is just a level of indirection between presentation and other layers of the system. It introduces an additional layer of business logic encompassing all underlying Application! systems. ! ! Application layer establishes what operations can be performed over the system and how they Resource! take place. It takes care of enforcing the business rules and establishing the business processes. ! ! Resource (Datum) deals with the organization (storage, indexing, and retrieval) of the data necessary to support the application logic. !9! 9/5/11! 10. Beolink.org!Design! 10! 9/5/11! 11. Design: NumbersBeolink.org!qRequest per second!!qPage composition!qContent type/Dimension!qNumber of users!qPeaks! 11!9/5/11! 12. Introduction: Architectural Properties Beolink.org! Simplicity! Customizability!NetworkEfciency! Scalability!User-perceived Performance! Network Performance!Modiability! Evolvability!Extensibility! Congurability! Reusability! 12!9/5/11! 13. Design: MatrixBeolink.org! Properties! Value! Network Performance ! User-perceived Performance! Network Efciency! Scalability! Simplicity! Modiability! Evolvability! Extensibility! Customizability! Congurability! Reusability!13! 9/5/11! 14. Beolink.org!Layers! 14! 9/5/11! 15. Design: Layer 0Beolink.org! qSplit the system in pieces! ! qI/O! qTCP/IP!!! qFilesystem! !! ! !! 15!9/5/11! 16. Design: I/OApplication! Resource!Beolink.org! qI/O! !Ethernet: bonding! !Disks: SAS/SSD/innibend! !! qFilesystem! !Avoid NFS! !Different FS (for operation type)!ls: cannot access nfsdir: Stale NFS le handle! !Specic Options (noatime)! !Block size! !Journaling! ! qLogging! !Dedicated sites! ! AMQP!760,000msg/secingress on an 8 waybox or 6,000,000msg/sec OPRA messages.!16!9/5/11! 17. Design: TCP/IP Application! Resource!Beolink.org! qTCP/IP! 500 req/sec*900 = 450.000 sockets !! !net.ipv4.tcp_tw_reuse=1!! !net.ipv4.tcp_tw_recycle=1!! !net.ipv4.tcp_n_timeout=30! ! !net.ipv4.tcp_keepalive_time=300! !/proc/sys/net/ipv4/ip_local_port_range! ! fs.le-max=128000! !net.core.somaxconn=250000! !net.ipv4.tcp_max_syn_backlog=2500! !net.core.netdev_max_backlog=2500! !ulimit -n 10240!! qIP Contract!ip_conntrack: table full, dropping packet.!! !net.ipv4.netlter.ip_conntrack_max! !! ! ! qFirewall!limit rate! !Request / rate! limitburst number! !IP ACL! !! !Dynamic ACL (phrel)!! !17! 9/5/11! !! 18. Design: PresentationApplication!Resource!Beolink.org! qYahoo Rules! qLoad Distribution! qProxy/Caching!Application! qWeb Server! Resource! qContent DeliveryNetwork ! qSplit access on differentDomains! !18! 9/5/11! 19. Design:: Yahoo! Rules Beolink.org! The Exceptional Performance team (Yahoo!) has identied a number of best practices to make web pages fast. The list includes 35 best practices divided into 7 categories.! !25 % without modifying infrastructure! Content!Server!Cookie! CSS! Javascript!Images!Mobile! Make FewerUse get forReducePutPut scripts at Make KeepHTTP Req! Ajax Req!Cookie Size!StylesheetsBotton!favicon.icocomponents Make Ajax Flush Buffer Use Cookie-at Top! Makesmall andunder 25 kb!Cacheable!Early! Gree Avoid CSS javascript and Cacheable!Pack Reduce the!Domains!exp! CSS ext!Optimize! componentsnumber of !Avoid Filters! Minify!Do not Scaleinto aDOM!! !Image in multipart !HTML!document!! 19! 9/5/11! 20. Design:: Yahoo! Rules Beolink.org! Example!20! 9/5/11! 21. Design: Load DistributionApplication! Resource!Beolink.org!q DNS! - Low TTL! - GEO IP! - Round Robin! - More than one IP! - Response base on system load! - Split Components acrossDomains!q Load Balancer! - TCP/IP! - Layer 7! - SSL!q Anycast! - Up to 32 systems per IP! - High availability on WAN ! 21! 9/5/11! 22. Design: Proxy/Caching Application!Resource!Beolink.org!120,000"100,000" q Congure ETags! 80,000"Throughput) q Add expiration or 60,000" 40,000" Cache-control Header! 20,000"0" ATS"2.1.9"Nginx"0.8.53" Varnish"2.1.5" q Extension modules!Req"/"sec" q Reverse Proxy base on url or domains! q Redirect on business logic (Middleware)! 22! 9/5/11! 23. Design: WebServerApplication! Resource!Beolink.org!q VirtualHost with dedicated IP!q Compress content !q Process Model! q Number of Process! q Number of clients! Apache optimization!! q Spare..! Remove unneeded modules!Set AllowOverride to None!Avoid FollowSymLinks and SymLinksIfOwnerMatch!Avoid content negotiation (Multiview)!MaxClients =(Total Memory - Operating Systemq KeepAlive and KeepAliveTimeout! Memory ) / Size Per Apache process.!!The KeepAlive directive allowsMinSpareServers, MaxSpareServers, and StartServers:!multiple requests to be sent over the Apache can spawn a maximum of 32 child processes persecond!same TCP connection. It changesmodel from Request to User!23!9/5/11! 24. Design: Content Delivery Network Application! Resource!Beolink.org!A content delivery network orcontent distribution network(CDN) is a system of computerscontaining copies of data placedat various nodes of a network.!!!! qPreload! qAccess Control ! 24! 9/5/11! 25. Design: Modular Application LogicBeolink.org!qDistribution!! Presentation!qAccelerator!qSession and Cookies!qMore instance on same Resource! system!! ! 25!9/5/11! 26. Design: Distribution Beolink.org! ! qShared! All components have all the functions (round robin)! qFunction/resource! Components are grouped by function/resource! qUser! Components are divided by cluster of users! 26!9/5/11! 27. Design: AcceleratorBeolink.org!qPHP!Accelerator !! !PHP!!with APC!!%! !with eA !! !%!! Tot sec !175.62 !33.21!528.83% !29.18!601.85%!Req/sec !5.69 !30.11!529.17% !34.26!602.11%!!ms per req !175.62 !33.21!528.83% !29.19!601.71%!!HIPHOP!!!qPython! !!Framework !Transaction rate [1/sec]! Go http!!2063 !!!Twister!!2020 !!!Web.go !!1753 !! Tornado!!1662 !!!Tornado+nginx !1364 !! Web.py+gevent !888!! Web.py+gunicorn !538!! Web.py+CheryPy!304!! Web.py+up+nginx!211!! 27! 9/5/11! 28. Design: Modular Application Logic Beolink.org!qCookies!Size!Encryption (key rotation)!!!!!!qSession!!Sticky session->table in memory!!Round Robin->memcached!!Two levels (NUMA)!!!28! 9/5/11! 29. Design: More instance Beolink.org!qBetter CPU Usage! Many applications do not support parallelization, then it is not possible to implement a scale up approach (deploying the applications on larger servers)!!!qBetter Memory Usage! Many applications still use 32 bits or have xed internal data structure!29! 9/5/11! 30. Design: Modular Application Logic!Beolink.org!qChoose the correct algorithms and data structures! dqueue vs list, hash vs trees, locks vs read/write locks, bloom lter!qMemory allocation! Reuse memory, stack vs heap, tcmalloc!qMake fewer system calls! Larger writes and reads! 31. Design: Data LayerBeolink.org!qFilesystem! q Distributed!Presentation! q Replication!qDatabase! Application! q Partitioning! q Replication! q NoSQL!qHierarchical Storage! q Directory Server! q JSR-170/230! ! 31! 9/5/11! 32. Design: Filesystem Beolink.org!qDistributed Filesystem!!- Local copy/cache!!- Parallel!!!qReplication!!- rsync+inotify!!- DRDB!!!!Do not use le system as a !COMMUNICATION PROTOCOL !! 32!9/5/11! 33. Design: DatabaseBeolink.org!qPartitioning!Small table!Many systems!!qReplication!Separation btw Read and Write!Different Index on different