Upload
thinh-tran-van
View
219
Download
1
Embed Size (px)
DESCRIPTION
IWeb Caching
Citation preview
Web Caching (Part 1)Mt caching server l server ng gia clients v web servers, tr li request t client thay cho web server nu nh:1. Client request html object c lu tr trong b nh cache.2. Cache object l version cng vi version c sinh ra t web servers nu request vo c n web server (fresh).Web cache c 3 loi: browser cache, proxy cache (forward proxy, transparent proxy), v gateway cache (reverse proxy). y chng ta ch nhc n loi th 3: Reverse proxy cache, ngn gn l reverse proxy. Mt m hnh reverse proxy n gin nh sau: Clients -------- Reverse proxy --------- Web ServersKhi client request url no , request ny c reverse proxy n nhn, c 2 trng hp:1. Nu URL c trong b nh cache, v vn *c cho* l cn mi, reverse proxy tr li request trn m khng cn phi to thm mt request na n Web server bn trong.2. Nu URL c request khng c trong b nh cache, reverse proxy truy vn web servers, lu cu tr li (nu URL tn ti), v tr li request ca client. (Tht ra phn ny c thm nhiu chi tit nh HTTP headers v i vo tng headers c th. Nhng khi vit xong c li th ha ra n c c trong RFC v hng hng xa s cc trang web, blog c nhn khc, nn li xa i).Khi phi phc v mt s lng ngi dng ln, ng ngha vi vic Reverse proxy, v web server vi m hnh bn trn khng p ng c nhu cu phc v qu nhiu s requests/s, ngi qun l thng m rng h thng cache bng cch add thm nhiu cache servers vo m hnh bn trn. Cache ARequests --- Load Balancer -------Cache B ------ Web Servers Cache CM hnh ny tng t m hnh bn trn, ch khc l request khi n load balancer, s c route vo bn trong cache farm da theo mt thut ton balancing no : round robin | server load | weight | priority...Cache farm cng s tr li nu URL c trn cache, hoc truy vn Web servers nu URL khng tn ti trn cache. M hnh ny c dng kh nhiu, v gim ti rt nhiu cho h thng web server bn trong, cng nh phc v c s requests ln hn *rt* nhiu so vi m hnh ban u.Tuy nhin vn c hn ch. Lu , ta s khng nhc n hn ch v mt Single Point of failure Load balancer, nu c, c l s c nhc n mt bi khc.Quay li m hnh bn trn, nu Client A truy vn URL(1) ln u tin (hoc URL(1) c thay i, cache object c nh du l Stale), s c LB route n Cache A, lc ny Cache A s kim tra b nh cache ca mnh, tm khng thy cache object c yu cu, v truy vn n Web Servers bn trong. Vic ny s tng t nu c Client B v client C gi cng URL(1) bn trn c route n Cache B v Cache C.Ta thy r s bt bnh thng y:1. C mt tc v c lp i lp li vi N Reverse proxies nu c N requests truy vn cng mt URL.2. Vi N requests, web servers bn trong phi x l N tc v tnh ton: to ra N threads (hoc processes), gi CGI hoc FCGI processes x l, truy vn database v tr v cng mt HTML object.Nh vy l hon ton khng cn thit. Ta ngh n mt m hnh mi: Cache Farm.Mt cache farm c m hnh hon ton ging bn trn, ch khc l cc Reverse proxies c quan h h hng (Parent, sibling) vi nhau v vi cc web server bn trong (ty cch implement cache). Cache cluster hot ng da trn mt protocol c bit:ICP (Internet Caching Protocol). Protocol ny c s dng nhm mc ch chia s cache objects gia cc caching servers vi nhau da vo mi quan h c n nh t trc. Thut ng c nhc n trong bi ny hon ton dng li t SQUID cache (Traffic server dng child cache v parent cache, mi child cache hoc parent cache c gi l mt node trong cluster), mi quan h ny c miu t nh sau:- Parent: Mt cache peer s gi mt ICP request n parent cache nu nh URL c client request khng tm thy trn bt c sibling cache peer no. V nu nh URL ny vn khng c tm thy parent cache, parent s forward ICP_QUERY hoc to mt HTTP request n parent cache khc m n c ch nh.- Sibling: Nu client request mt URL no khng tm thy local cache, sibling cache s gi nhiu ICP_QUERY request n cc anh ch (v parent cache) ca n truy vn URL trn. Nu cc anh/ch no tr li vi mt ICP_HIT response, sibling cache s yu cu object ny t server gi ICP_HIT response sm nht v nhn v ICP_HIT_OBJ response. Trong trng hp sibling cache khng nhn c ICP_HIT response t bt c anh/ch no, n s gi request n parent cache. Sibling cache khng c quyn forward ICP_MISS n parent cache trong trng hp ICP_MISS ny n t mt sibling cache khc.Ni nh vy c ngha l trong hu ht mi trng hp, ta u cn cu hnh mt sibling cache c nhiu sibling cache v t nht mt parent cache, trnh trng hp reverse proxy tr li client bng mt response vi ERROR code (Not found).Cng nn ch khng nn cu hnh qu nhiu sibling v parent cho mt cache peer, ICP request c gi ng thi ti nhiu sibling v parent cache, s lm tng latency ca network, hoc nghn trong trng hp c qu nhiu request miss cache.ICP protocol c pht trin t rt sm, v mt trong nhng u im ca protocol ny, cng nh mong mun ca nhng ngi to ra n l lm sao cho packet c kch thc b nht c th, nhanh nht c th, v ng nhin phi carry cc thng tin cn thit. Caching software hin nay m hu ht chng ta vn ang s dng l Squid (lu i nht), Apache Traffic Server (ban u l Yahoo Traffic server, sau Yahoo contribute software ny cho cng ng Open Source v c Apache Foundation bo tr), Varnish Cache, v tht ra khng phi caching software nhng li hot ng rt tt nu c config l reverse proxy: Nginx v rt nhin cc commercial caching software khc.Trong : SQUID v Traffic server c implement v support ICP ngay t nhng ngy u trong ATS cn s dng mt protocol khc l RPC, Varnish khng support ( https://www.varnish-software.com/blog/why-icp-isnt-happening-and-generally-bad-idea ) v nginx khng nn support (v bn thn l web server, caching ch l additional feature).Mt nguyn tc khi xy dng h thng:"Khi chng ta gn thm cho h thng mt tnh nng, th ng thi vi tnh nng , kh nng gy ra li tng thm 20%". V th, bn ng ngc nhin nu khi c n y, mnh s bo bn rng mnh khng dng ICP.L do: 1. Nu c link lin kt n varnish blog bn s thy rng h hon ton c l khi cho rng vic request v nhn mt cache object thng qua HTTP protocol so vi h tng network nh hin nay (Gbps) ch l chuyn nh, nu HTTP c chm hn ICP th chm cng ch b hn vi milisenconds. V qu tht l nh vy. Nu bn tng tnh xem mt SQUID cache cn tn bao nhiu thi gian c th "clone" mt website tin tc c vi chc chuyn mc v vi vi trm trang HTML + content (da vo userrequest), bn s thy rng nhn nh trn hon ton hp l.2. Nu ta s dng ICP, ng ngha vi vic gn thm mt service na vo h thng, s dng 2 protocols cng lm mt vic (fetching cache). Tnh phc tp ca h thng tng ln, cha k n vic phi maintain lm sao h thng dng ICP cho hiu qu. VaRnish c nhc n vic duplicate content nu dng ICP, n gin v varnish khng lu cng mt cache object trn tt c cc node m dng mt hash table lu tr v tr cache trn tng node, tuy nhin nu bn c tip bi ny, th c ngha bn chp nhn vic content duplication trn tt c cc node. Ta s khng nhc n kha cnh tt/khng tt ca vic s dng hash table kia, cng thm kin trc ca varnish c xem l excellent architecture for modern operation system, Varnish c ca ngi trn tt c cc mt trn. (Mnh khng dng Varnish).Khng dng ICP (v tt nhin c HTCP) , c ngha ta dng HTTP. Vic s dng HTTP n gin v bn thn back-end server full support HTTP protocol (v l web server), bn thn cc node trong cache farm cng full support HTTP protocol (v l HTTP proxy). M hnh ban u l cch mt web cache thay mt client request object da trn client request n back-end server, lu object vo cache memory ng thi tr li request ca client vi cache object .Vy nu ta add thm mt node vo cache farm, n gin cu hnh cache farm ny nu miss cache th gi web cache th 1 v back-end server. Vic ny tng t nu ta li add thm mt node mi.Ch l lun cn ch nh backend server l mt cache peer, gim ti da trn thut ton balancing sn c ca mi proxy cache (ngi vit lun dng round robin), v backend server lun l server c kh nng tr v response vi content mi nht m client cn.http://blog.tinytechie.net/2011/07/web-caching.html2