NY Times: so news doesn't break your server

Preview:

Citation preview

@NYTDevs | developers.nytimes.com

@NYTDevs | developers.nytimes.com

Varnish: Linchpin of the NYTimes.com Re-architecture

Adam E. FalkSoftware Architect, Web Products

@NYTDevs | developers.nytimes.com

Who I Am

A software architect focusing on server configuration and resiliency, with sidelines in DevOps, release engineering, and testing.

Started as a LAMP developer but has always been a generalist interested in all aspects of the data center.

@NYTDevs | developers.nytimes.com

Who We Are

Photo credit: Tony Cenicola/The New York Times

@NYTDevs | developers.nytimes.com

Scope of this Presentation

Everything that follows pertains to the use of Varnish to accelerate serving content on the <www.nytimes.com> hostname, only.

There are several other Varnish clusters at NYTimes.com.

@NYTDevs | developers.nytimes.com

NYTimes.com: Size

15+ million page URLs (1851–present)● Not all HTML; working on that

200+ new page URLs created each day

Millions more image URLs

@NYTDevs | developers.nytimes.com

NYTimes.com: Traffic<www.nytimes.com> normal daily peak is ~75,000 requests/second – just this hostname.

● primarily APIs● HTML traffic is ~4,000 req/sec

Traffic spikes up to 4x during abreaking news event

R.I.P. Leonard Nimoy

@NYTDevs | developers.nytimes.com

2013 Redesign of NYTimes.com

@NYTDevs | developers.nytimes.com

Mission Statement

“Leverage the latest technology in order to improve the user experience, enhance our journalism, and provide a more effective environment for our advertisers.”

Project document

@NYTDevs | developers.nytimes.com

Improve the User Experience

Technical goals:1. 25% improvement in browser load time,

minimum.2. ...

Sounds like a job for page caching!

@NYTDevs | developers.nytimes.com

50% or better improvement in● Time to first byte● Time to paint● Time to page ready

Achievement Unlocked

@NYTDevs | developers.nytimes.com

Brave New World

@NYTDevs | developers.nytimes.com

Exception to the Rule

A complete code rewrite (almost). Why?● < insert usual suspects here >● Deeply embedded server-side personalization

(includes ads)

Output was simply uncacheable.

@NYTDevs | developers.nytimes.com

Never Let a Crisis Go To Waste

☒ (Test|Behavior) Driven Development☒ Web performance was core from Day 0☒ Async wherever, whenever☒ New APIs☒ CSS: LESS (then), SASS (now)

@NYTDevs | developers.nytimes.com

Can We Cache Pages Now?

Yes, Virginia.

</summary>

@NYTDevs | developers.nytimes.com

Spotlights for You

VCL file modular organization

Cache refresh instead of purge

Varnish cluster today

@NYTDevs | developers.nytimes.com

Changing Horses in Midstream

Site functionality that must not break:● redirects (mobile, registration, et. al.)● user tracking● web crawler detection

@NYTDevs | developers.nytimes.com

Best Practice (singular)

@NYTDevs | developers.nytimes.com

Easy Yet Powerful

@NYTDevs | developers.nytimes.com

Easy Yet Powerful

@NYTDevs | developers.nytimes.com

Easy Yet Powerful

@NYTDevs | developers.nytimes.com

Easy Yet Powerful

@NYTDevs | developers.nytimes.com

Greatest Thing Since Sliced Bread☒ Single responsibility principle☒ Code readability (and understanding!)☒ Time spent troubleshooting☒ Coding standards

@NYTDevs | developers.nytimes.com

Intermission

There are only two hard things in Computer Science:

1.Cache invalidation2.Naming things3.Off-by-one errors

http://martinfowler.com/bliki/TwoHardThings.html

@NYTDevs | developers.nytimes.com

Cache Invalidation

Purge is not good enough (in Varnish 3).

PURGE causes cache misses on the highest-traffic content.

Needed cache re(set|build|prime).

@NYTDevs | developers.nytimes.com

NYT Homepage

● Must always be in Varnish cache.● Every article linked to on the

homepage should already be in Varnish cache.

No cache misses = long TTL.

@NYTDevs | developers.nytimes.com

But...

Some content changes frequently.Latest version served in real-time after every publish action.

Short TTL = more cache misses.PURGE = more cache misses.

@NYTDevs | developers.nytimes.com

Cache Rules Everything Around Me

CREAM: an API to re(set|build|prime) a single cache entry.

Publish event calls API synchronously.

@NYTDevs | developers.nytimes.com

req.hash_always_miss = true

CREAM requests the just-updated article to every Varnish server, in parallel.

@NYTDevs | developers.nytimes.com

Where We Are Today: Software

~2,300 lines of VCL code● Minimum of inline C

10 VMODs● std, utils, crashhandler, wurfl, boltsort,

queryfilter● 4 custom

@NYTDevs | developers.nytimes.com

Where We Are Today: Traffic

Of the ~4,000 page requests/second to <www.nytimes.com>:

● ~1,500 now served by Varnish● ~91% cache hit rate (down from

~96%)

@NYTDevs | developers.nytimes.com

Where We Are Today: Performance

Load test: ~3,000 requests/second/server with current configuration

We could handle a 4x spike with 2 servers

We run 8 servers per data center

@NYTDevs | developers.nytimes.com

8 Servers? Why?!Because:

● Biggest spike ever was 10x (2012 Election Night)

● 2 hypervisors => even number of server instances

● Takes too long for us to dynamically provision● We can afford to stay over-provisioned

Yes, this causes extra backend network traffic.

Scaled out for resilience, scaling up for performance.

@NYTDevs | developers.nytimes.com

Next Steps for Us

1. Install Varnish Cache Plus 42. Utilize the Varnish Plus tools for

monitoring.3. Replace CREAM with VHA

@NYTDevs | developers.nytimes.com

Thank You

Adam E. Falkfalkae@nytimes.com

@xenogragadamfalk.com xenograg.com

We’re hiringnytimes.com/careers

@NYTDevs | #timesopen | developers.nytimes.com