DNC-Big Data and Data Mining in 2012 US Election

Preview:

DESCRIPTION

DNC-Big Data and Data Mining in 2012 US Election. Azamat Kamzin Mandar Bhide. Overview. Highlights of Narwhal System Organization Classification Associative patterns Predictive models References. Highlights. Codename: Narwhal Budget:$100 million Lead Developer: Scott VanDenPlas - PowerPoint PPT Presentation

Citation preview

DNC-Big Data and Data Mining in 2012 US Election

Azamat KamzinMandar Bhide

Overview

• Highlights of Narwhal

• System Organization

• Classification

• Associative patterns

• Predictive models

• References

Highlights

• Codename: Narwhal

• Budget:$100 million

• Lead Developer: Scott VanDenPlas

• Chief Analytics: Dan Wagner

• Team: Approx. 200 members

• General Objective:o Bring together information on voters, supporters, donorsat one

place( unlike in 2008 where information was split 6 different servers/vendors)

• It was top 20 largest consumer/customer databases ever madeo Size: As per VanDenPlas tweet

“4Gb/s, 10k requests per second, 2,000 nodes, 3 datacenters, 180TB and 8.5 billion requests...”

(Service Provider:Amazon Cloud)

System Organization

Data Collection

/Enrichment• Automated 1.2 million call survey per day

• Tracking visitors behavior online using cookies

DreamCatcher

• Level of support for Obama

• Likelihood to vote• Estimate donation

Amount

Narwhal

2008 Voter databases

Private/ Public

Databases

Call/Email to motivate the voter

Best Channel and timeslot

to advertise

Directing volunteers to right

door

Right email Ad to right person

Dreamcatcher -Voter Classification

• Classification was done in 4 categories

Dreamcatcher:Association Pattern

• Output: Detailed profile of voters

• Inputs are attributes of each individual stored in Narwhalo Voting historyo Social media Likes, commentso Volunteeringo Magazine subscriptionso Registered caro Insurance datao Individual Private Information

from firms like Aristotle

Predictive Models

• A/B Testing:

To understand which image or text user response will be

higher

Ex. “Learn More” garnered 18.6 percent more signups per

visitor than the default of “Sign Up.” 

• Time Series Analysis: To understand Approval and disapproval trend

• Regression o Used to calculate Electoral votes(dependent variable) based on top

issues such as economy, healthcare etc.o Packages used were SAS, R and MATLAB

• Decision Treeso We don’t believe they used decision trees due to large number of

attributes which differ with each individual

Predictive Models

Reference

•Author: Michael Scherer ( November8, 2012). “How Obama's data crunchers helped him win” . Retrieved from http://www.cnn.com/2012/11/07/tech/web/obama-campaign-tech-team

•Author: Sasha Issenberg (December 19, 2012). “How President Obama’s campaign used big data to rally individual voters” . Retrieved from http://www.technologyreview.com/featuredstory/509026/how-obamas-team-used-big-data-to-rally-voters/

Recommended