Upload
jason-ragsdale
View
1.242
Download
0
Embed Size (px)
DESCRIPTION
Web Speed And Scalability
Citation preview
Jason Ragsdale – 01/08/2008
How to build a bigger, faster, and more reliable website
You will learn the concepts of Speed and Scalability
Specific Examples of Caching, Load Balancing and testing tools.
What is Scalability? Avoiding Failure High Availability?!?!?
Monitoring Release Cycles Fault Tolerence Load Balancing
Static Content Caching Yslow (Let it be your friend)
Horizontal Scalability Capacity can be increased just by adding more
hardware/software Best solution Does not guarntee that you are safe
Up (Vertical) Scalability Capacity can be increased by adding more Disk
Storage, RAM , Processors Expensive Should only be used if Horizontal will not work for you Difficult to move to Horizontal if you run out of capacity
in your hardware
Capital investment will be made The system will be more complex Maintenance costs will increase Time will be required to act
Good Planning Have a plan for whatever you are about to do to your
system, and most importantly, have a roll-back plan if and when things do not work the way you expected.
Functional and Unit Testing Automated test do not catch everything that can go
wrong, but they are very good at catching bugs introduced by changes elsewhere in your code base
Unit Testing (PHPUnit, Simpletest) Function Testing (selenium)
Control Change (Version Control) USE IT!!!! There is no better way even as a single
developer to keep your codebase safe from bad changes
Version Control in Action /trunk/
Used for all mainline development /production/
Only stable and production ready code from trunk is contained in here. Only make fix severe bug fixes in this branch
/tags/ Holds copies of production ready code
Do not use Version Control as a backup solution, backup your VCS seperately
High Availablity?!?!?! What is “five nines” 99.999%?
Do the math, 60 seconds * 60 minutes * 24 hours * 365 days 31,536,000 seconds of uptime a year
99.999 * 31536000 = 315.36 seconds of downtime a year
Understand the goodness of “Planned maintence periods” There are things you will need to do to your systems
on a peridoic basis I.E. Database Cleanup, Disk Defrag, Software/Hardware Upgrades
You can stagger your maintence periods if you have enough servers so you have no custmomer downtime, just a reduction in capacity
Monitoring No matter how stable your code is or how
reliable your hardware, you will have failure Monitoring Methods
Top Down (Business Monitors) Monitor the application as the customer interacts with it
Bottom Up (System Monitors) Most commonly used Monitors the base components of your application like
Disk Space Network speed Database Statistics
By no means bad, but without Business Monitoring you will not be able to catch all failures
Criteria For A Monitoring System SNMP Support
Can support most systems out there Extensibility
Ability to plugin custom monitoring packages Flexible notifications
Handle notifing operators and escaliting issues if they are not looked into Custom reaction
In the event of errors that can not be diagnosed by computers, need to be able to notify a human to do further investigation
Complex scheduling Ability to set the monitoring frequency and timing per monitoring item
Maintenance scheduling Monitors should never be taken offline, they need to be smart enough to know
when a maintence period is in effect Event acnowledgement
Ability to understand when a event needs to be paged to a human at 2am, and when it shouldent
Service dependencies You need to monitor all points between your monitoring system and the client. This
includes Firewalls, Routers, Switches
Release Cycles Basic Release Cycle
Development Things are expected to break
Staging QA and bug fixing a build before release
Production Only serious bug fixes are pushed
Keep in mind that reality has priority over “Best Practice” You can and will have to release from
development… it happens
Fault Tolerence
router
switch
www-1-1
www-1-2
Intertubes
router
switch
www-1-1
www-1-2
Intertubes
router
switch
Load Balancing Load Balancing is NOT HA Balancing is meant to spread the workload of requests
across the cluster Balancing Approaches
Round robin One request per server in a uniform rotation
Least connections The faster the machine processes requests the more it will receive
Perdictive Useally based on Round robin or Least connections with some
custom code Available resources
Not a good choice, bad performance Random
Pure random distribution of requests Weighted random
Random with a preference to specific machines
Static Content Static content is
Images CSS JS Any non dynamic element
Serving these items from a dedicated server fees up your web process for actual dynamic code, intern increasing your capacity and response speed
On you static server you can use lightHTTP, which is very quick at serving static content compaired to apache (Although apache 2.2.x is much better than 1.3.x)
Types of Caching Layered / Transport Cache
“Transparent” Placed infront of your hardware and caches requests before they
hit your webserver Intergrated (Look-Aside) Cache
Computational Reuse technique Used where the cost of storing the results of a computation and later
finding them again is less expensive than performing the computation again
Write-Thru Caches Application is responsible for updating the Cache and Datastore
when changes are made Write-Back Caches
All data changes are made to the cache Cache layer is responsible for modifing the backend datastore
Distrubuted Cache Using several machines to cache data, distrubiting the data and
load Memcached can do this very simply
Memcahed It is a high-performance, distributed object caching
system It is simple to setup and use
# ./memcached -d -m 2048 -l 10.0.0.40 -p 11211 It is not designed to be redudant
If you loose data you memcache will repopulate the data as it is accessed
It provides no security to your cache “Memcached is the soft, doughy underbelly of your
application. Part of what makes the clients and server lightweight is the complete lack of authentication. New connections are fast, and server configuration is nonexistent. If you wish to restrict access, you may use a firewall, or have memcached listen via unix domain sockets.”
Limitations Key size limited to 250 characters Data size limited to 1MB
APC and why it’s your friend Alternative PHP Cache
The Alternative PHP Cache (APC) is a free and open opcode cache for PHP. It was conceived of to provide a free, open, and robust framework for caching and optimizing PHP intermediate code.
Just enabling APC will transparently cache your code as you use it, no code changes required on your side
Provides a cheap caching layer that can be shared on a between all apache processes on one machine
YSlow? Based on 13 princables from
http://developer.yahoo.com/performance/rules.html 1.) Make fewer HTTP requests
80% of the end-user response time is spent on the front-end. Most of this time is tied up in downloading all the components in the page: images, stylesheets, scripts, Flash, etc. Reducing the number of components in turn reduces the number of HTTP requests required to render the page. This is the key to faster pages.
2.) Use a CDN The user's proximity to your web server has an impact on response times.
Deploying your content across multiple, geographically dispersed servers will make your pages load faster from the user's perspective. But where should you start?
3.) Add an Expires header Web page designs are getting richer and richer, which means more scripts,
stylesheets, images, and Flash in the page. A first-time visitor to your page may have to make several HTTP requests, but by using the Expires header you make those components cacheable. This avoids unnecessary HTTP requests on subsequent page views. Expires headers are most often used with images, but they should be used on all components including scripts, stylesheets, and Flash components.
4.) Gzip components The time it takes to transfer an HTTP request and response across the network can
be significantly reduced by decisions made by front-end engineers. It's true that the end-user's bandwidth speed, Internet service provider, proximity to peering exchange points, etc. are beyond the control of the development team. But there are other variables that affect response times. Compression reduces response times by reducing the size of the HTTP response.
YSlow? 5.) Put CSS at the top
While researching performance at Yahoo!, we discovered that moving stylesheets to the document HEAD makes pages load faster. This is because putting stylesheets in the HEAD allows the page to render progressively.
6.) Put JS at the bottom Rule 5 described how stylesheets near the bottom of the page prohibit
progressive rendering, and how moving them to the document HEAD eliminates the problem. Scripts (external JavaScript files) pose a similar problem, but the solution is just the opposite: it's better to move scripts from the top to as low in the page as possible. One reason is to enable progressive rendering, but another is to achieve greater download parallelization.
7.) Avoid CSS expressions CSS expressions are a powerful (and dangerous) way to set CSS
properties dynamically. They're supported in Internet Explorer, starting with version 5. As an example, the background color could be set to alternate every hour using CSS expressions.
8.) Make JS and CSS External Many of these performance rules deal with how external components
are managed. However, before these considerations arise you should ask a more basic question: Should JavaScript and CSS be contained in external files, or inlined in the page itself?
YSlow? 9.) Reduce DNS lookups
The Domain Name System (DNS) maps hostnames to IP addresses, just as phonebooks map people's names to their phone numbers. When you type www.yahoo.com into your browser, a DNS resolver contacted by the browser returns that server's IP address. DNS has a cost. It typically takes 20-120 milliseconds for DNS to lookup the IP address for a given hostname. The browser can't download anything from this hostname until the DNS lookup is completed.
10.) Minify JS Minification is the practice of removing unnecessary characters from code to reduce its size thereby
improving load times. When code is minified all comments are removed, as well as unneeded white space characters (space, newline, and tab). In the case of JavaScript, this improves response time performance because the size of the downloaded file is reduced. Two popular tools for minifying JavaScript code are JSMin and YUI Compressor.
11.) Avoid redirects Redirects are accomplished using the 301 and 302 status codes.
12.) Remove duplicate scripts It hurts performance to include the same JavaScript file twice in one page. This isn't as unusual as you
might think. A review of the ten top U.S. web sites shows that two of them contain a duplicated script. Two main factors increase the odds of a script being duplicated in a single web page: team size and number of scripts. When it does happen, duplicate scripts hurt performance by creating unnecessary HTTP requests and wasted JavaScript execution.
13.) Configure Etags Entity tags (ETags) are a mechanism that web servers and browsers use to determine whether the
component in the browser's cache matches the one on the origin server. (An "entity" is another word for what I've been calling a "component": images, scripts, stylesheets, etc.) ETags were added to provide a mechanism for validating entities that is more flexible than the last-modified date. An ETag is a string that uniquely identifies a specific version of a component. The only format constraints are that the string be quoted. The origin server specifies the component's ETag using the ETag response header.
14.) Make AJAX cachable People ask whether these performance rules apply to Web 2.0 applications. They definitely do! This
rule is the first rule that resulted from working with Web 2.0 applications at Yahoo!.
Example apache 2.x performace config
# enable expirationsExpiresActive On# expire GIF images after a month in the client's cacheExpiresByType image/gif A2592000ExpiresByType image/jpeg A2592000ExpiresByType text/css A2592000ExpiresByType application/x-javascript A2592000
# disable ETagsFileETag None
Example apache 2.x performace config
# Gzip Compression
# Insert filterSetOutputFilter DEFLATE
# Netscape 4.x has some problems...BrowserMatch ^Mozilla/4 gzip-only-text/html
# Netscape 4.06-4.08 have some more problemsBrowserMatch ^Mozilla/4\.0[678] no-gzip
# MSIE masquerades as Netscape, but it is fineBrowserMatch \bMSIE !no-gzip !gzip-only-text/html
# NOTE: Due to a bug in mod_setenvif up to Apache 2.0.48# the above regex won't work. You can use the following# workaround to get the desired effect:BrowserMatch \bMSI[E] !no-gzip !gzip-only-text/html
# Don't compress imagesSetEnvIfNoCase Request_URI \\.(?:gif|jpe?g|png|mp3)$ no-gzip dont-vary
# Make sure proxies don't deliver the wrong contentHeader append Vary User-Agent env=!dont-vary
YSlow: http://developer.yahoo.com/yslow/ Rules:
http://developer.yahoo.com/performance/rules.html Scalable Internet Architectures
By Theo Schlossnagle APC: http://us3.php.net/apc Memcahed: http://www.danga.com/memcached/ Selenium: http://www.openqa.org/selenium/ Simpletest: http://simpletest.org/ PHPUnit: http://www.phpunit.de/