57
Just another bughunt? Tools to improve your site without nuking it from orbit #DPA11 Ken Newquist (@knewquist) | Charles Fulton (@mackensen)

Just another bughunt

Embed Size (px)

DESCRIPTION

It's not the bugs you know that kill a website. It's the ones you can't see, lurking just out of sight, that get you. Learn how Lafayette College identified the Lovecraftian code horrors lurking beneath its feet with tools like Splunk (server log analysis), OSSEC (server-side bad behavior monitor) and SiteImprove (web page auditing tool) and then surgically eliminated the problems. Examples include PHP scripts spewing error notices into logs, undiscovered CAS authentication failures, and thumbnail generation scripts that choke on large files.

Citation preview

Page 1: Just another bughunt

Just another bughunt?

Tools to improve your site without nuking it from orbit#DPA11Ken Newquist (@knewquist) | Charles Fulton (@mackensen)

Page 2: Just another bughunt

Who we are

Ken NewquistDirector, Web Applications Development Lafayette College

Charles FultonSenior Web Applications DeveloperLafayette College

#DPA11

Page 3: Just another bughunt

Rebuild or Fix?

● Your website’s problems may seem intractable

● The temptation to nuke the bugs and start fresh is strong

● We’ve found tools that identify the problems so we can surgically eliminate them○ (and find a few issues we didn’t know about in the

process)#DPA11

Page 4: Just another bughunt

#DPA11Tools

Page 5: Just another bughunt

● Crawls web presence● Reports broken links and common

misspellings● Shows changes over time● Pretty graphs!

Siteimprove

#DPA11

Page 6: Just another bughunt

Pretty graph!#DPA11

Page 7: Just another bughunt

Splunk

● Log aggregation● Real-time monitoring● Rich analysis● More pretty graphs!

#DPA11

Page 8: Just another bughunt

Another pretty graph!#DPA11

Page 9: Just another bughunt

Nagios

● Real-time monitoring● Defines a base-line of system performance● Does not detect presence of dinosaurs

#DPA11

Page 10: Just another bughunt

Dinosaurs! #DPA11

Page 11: Just another bughunt

OSSEC

● Log-based intrusion detection system● Define states of acceptable behavior● No pretty graphs

#DPA11

Page 12: Just another bughunt

Not a pretty graph :/#DPA11

Page 13: Just another bughunt

● Define expected behavior with OSSEC & Nagios

● Test expectations with Siteimprove & Splunk

● Here be monsters

Discovering your web presence

#DPA11

Page 14: Just another bughunt

Investigations #DPA11

Page 15: Just another bughunt

The Lost Thumbnails

● Site: Moodle● Tools: Splunk, OSSEC● Outcome: Improved

Apache configuration

#DPA11

Page 16: Just another bughunt

Sky falling!

● Splunk reported ~400 500 internal server errors within a few minutes

● Also showed concentrated bursts of 404 errors when viewing resources

● Concern within department that sky was falling

#DPA11

Page 17: Just another bughunt

Sky not falling!

● System ran out of memory generating thumbnails from massive images; threw 500s

● Preview of missing images generated the 404s

#DPA11

Page 18: Just another bughunt

Outcomes

● Memory limits were not reasonable● Users do not report catastrophic errors

#DPA11

Page 19: Just another bughunt

Comments

● Site: WordPress● Tools: Splunk, OSSEC● Outcome: WordPress

core fixes

#DPA11

Page 20: Just another bughunt

What Lies Beneath

● 500 errors are reserved for server issues● WordPress has notions of its own

○ Double-submitted comment? 500 error○ Missing a required field? 500 error○ Blank comment? 500 error

● OSSEC would ban all of these for bad behavior

#DPA11

Page 21: Just another bughunt

https://github.com/bigcompany/know-your-http#DPA11

Page 22: Just another bughunt

Outcomes

● Learned reasonable mistakes can yield unreasonable error codes

● Hacked core to return 200s and 400s instead

● Core is discussing what to do○ https://core.trac.wordpress.org/ticket/11286

#DPA11

Page 23: Just another bughunt

Revenge of the Base Theme

● Site: WordPress● Tools: Siteimprove● Outcome: WordPress

theme fix; Apache configuration change

#DPA11

Page 24: Just another bughunt

March 10: the day the links broke#DPA11

Page 25: Just another bughunt

Nothing to see here … oh wait--

● Developer dismissed initial reports of login issues as user error

● Then Siteimprove said we had 1,800 new broken links

● A two-character change in RHEL defaults for httpd.conf broke WordPress

#DPA11

Page 26: Just another bughunt

Lessons

● Small changes have vast consequences● Documentation is doubleplusgood

#DPA11

Page 27: Just another bughunt

The Incredible Shrinking Provost

● Site: Drupal● Tools: Splunk● Outcome: Cleaned data in

ERP system

#DPA11

Page 28: Just another bughunt

Who’s the fairest of them all?

● The directory passes the search query via a GET parameter

● Splunk told us our associate provost, “Jane Doe”, was most-searched by an order of magnitude

#DPA11

Page 29: Just another bughunt

...we searched for “Jane Doe”...

...and the search returned...

...NOTHING!

#DPA11

Page 30: Just another bughunt

Lessons

● “Jane A. B. Doe !== Jane Doe”● Data lies

#DPA11

Page 31: Just another bughunt

Dumpster fire#DPA11

Page 32: Just another bughunt

The Virtual Tour

● Site: Custom app● Tools: Splunk● Outcome: Fixed PHP

bugs

#DPA11

Page 33: Just another bughunt

Pretty graphs!● 238,908 errors...in three days● (We didn’t expect that)

#DPA11

Page 34: Just another bughunt

Fixed it!

#DPA11

Page 35: Just another bughunt

Outcomes

● No one cares that we fixed the Virtual Tour ○ (we feel better though)

#DPA11

Page 36: Just another bughunt

Mr. Foo and Mr. Bar

● Site: WordPress● Tools: Splunk● Outcome: Disproved long-

standing alleged bug

#DPA11

Page 37: Just another bughunt

I swear I wasn’t there!

● Various reports over the years alleging that WordPress improperly reported another user was editing a post

● Much speculation and theorizing in absence of facts

#DPA11

Page 38: Just another bughunt

Outcomes

● People are wrong on the Internet

#DPA11

Page 39: Just another bughunt

The Cache That Wouldn’t Die● Site: WordPress● Tools: Nagios● Outcome: Database

size reduced by two-thirds

#DPA11

Page 40: Just another bughunt

Doom at 11….

● Nagios had concerns

● MySQL ran out of disk space

● Size of WordPress DB tripled in two weeks

#DPA11

Page 41: Just another bughunt

Pretty terminal dumps?

SELECT option_name FROM wp_190_options WHERE option_name LIKE "displayed_gallery%";...| displayed_gallery_rendering_ffffb5e48845fbb7b3347244f8aa06d4 || displayed_gallery_rendering_ffffd6d9f2ab40195295c70f775b0ee8 || displayed_gallery_rendering_ffffe1416b8d969e25ec7a6094282bbe || displayed_gallery_rendering_ffffe8e4a0c399605f434bd51be2d9d7 |+--------------------------------------------------------------+722141 rows in set (2.28 sec)

#DPA11

Page 42: Just another bughunt

…Salvation at Noon

● The Google Mini found something terrible lurking in club websites

● NextGEN Gallery bug caused near-endless crawl by the mini

● Code bug meant the cache never expired

#DPA11

Page 43: Just another bughunt

Outcomes

● NextGEN Gallery has stability issues● Listen to Nagios● It’s turtles all the way down

#DPA11

Page 44: Just another bughunt

Attack of the Python Script● Site: WordPress● Tools: Nagios, Splunk● Outcome: Quickly

identified source of massive load event

#DPA11

Page 45: Just another bughunt

Traffic Jam!

● Load on a server spiked at 800%

● Seemed bad● Nagios had more

concerns

#DPA11

Page 46: Just another bughunt

Hello there!

● Splunk real-time monitoring revealed top client IPs

● We’re very popular with a misconfigured IIS Server in Oregon and its “Python-urllib/3.4” script

#DPA11

Page 47: Just another bughunt

Outcomes

● Banned the IP on the proxy

● Began developing rate-limiting rules for OSSEC

#DPA11

Page 48: Just another bughunt

Alternatives #DPA11

Page 49: Just another bughunt

Bughunting on the cheap

W3C Link Checker● Reports on broken links to a specified depth● http://validator.w3.org/checklinkGoogle Webmaster Tools● Details on broken links and server errors● https://www.google.com/webmasters/tools/

#DPA11

Page 50: Just another bughunt

More options● Bureau of Internet Accessibility

○ Cheaper than Siteimprove○ Broken link and accessibility reports○ http://www.boia.org

● Google Analytics○ Identify high-traffic broken pages○ http://google.com/analytics

● vim | grep○ Eyeballing your logs can’t hurt

#DPA11

Page 51: Just another bughunt

Conclusions #DPA11

Page 52: Just another bughunt

Did we really fix all those errors?

Or is logging broken?#DPA11

Page 53: Just another bughunt

● Data are free● Bugs are hard to find● Reports are expensive● Good reports make finding bugs easy● You can improve your site without rebuilding

it from scratch● You will find more bugs than you can fix

Takeaways

#DPA11

Page 54: Just another bughunt

#DPA11

Page 55: Just another bughunt

Anatomy of a Redirect

● Tool: Splunk● Forthcoming from

Lafayette College● WordPress tries to be

helpful!

#DPA11

Page 56: Just another bughunt

Join the discussion at https://core.trac.wordpress.org/ticket/16557!

#DPA11

Page 57: Just another bughunt

Ken Newquist ● [email protected]● @knewquistCharles Fulton ● [email protected]● @mackensen

Questions?

#DPA11