9
DNC-Big Data and Data Mining in 2012 US Election Azamat Kamzin Mandar Bhide

DNC-Big Data and Data Mining in 2012 US Election

  • Upload
    titus

  • View
    52

  • Download
    0

Embed Size (px)

DESCRIPTION

DNC-Big Data and Data Mining in 2012 US Election. Azamat Kamzin Mandar Bhide. Overview. Highlights of Narwhal System Organization Classification Associative patterns Predictive models References. Highlights. Codename: Narwhal Budget:$100 million Lead Developer: Scott VanDenPlas - PowerPoint PPT Presentation

Citation preview

Page 1: DNC-Big Data and Data Mining in 2012 US Election

DNC-Big Data and Data Mining in 2012 US Election

Azamat KamzinMandar Bhide

Page 2: DNC-Big Data and Data Mining in 2012 US Election

Overview

• Highlights of Narwhal

• System Organization

• Classification

• Associative patterns

• Predictive models

• References

Page 3: DNC-Big Data and Data Mining in 2012 US Election

Highlights

• Codename: Narwhal

• Budget:$100 million

• Lead Developer: Scott VanDenPlas

• Chief Analytics: Dan Wagner

• Team: Approx. 200 members

• General Objective:o Bring together information on voters, supporters, donorsat one

place( unlike in 2008 where information was split 6 different servers/vendors)

• It was top 20 largest consumer/customer databases ever madeo Size: As per VanDenPlas tweet

“4Gb/s, 10k requests per second, 2,000 nodes, 3 datacenters, 180TB and 8.5 billion requests...”

(Service Provider:Amazon Cloud)

Page 4: DNC-Big Data and Data Mining in 2012 US Election

System Organization

Data Collection

/Enrichment• Automated 1.2 million call survey per day

• Tracking visitors behavior online using cookies

DreamCatcher

• Level of support for Obama

• Likelihood to vote• Estimate donation

Amount

Narwhal

2008 Voter databases

Private/ Public

Databases

Call/Email to motivate the voter

Best Channel and timeslot

to advertise

Directing volunteers to right

door

Right email Ad to right person

Page 5: DNC-Big Data and Data Mining in 2012 US Election

Dreamcatcher -Voter Classification

• Classification was done in 4 categories

Page 6: DNC-Big Data and Data Mining in 2012 US Election

Dreamcatcher:Association Pattern

• Output: Detailed profile of voters

• Inputs are attributes of each individual stored in Narwhalo Voting historyo Social media Likes, commentso Volunteeringo Magazine subscriptionso Registered caro Insurance datao Individual Private Information

from firms like Aristotle

Page 7: DNC-Big Data and Data Mining in 2012 US Election

Predictive Models

• A/B Testing:

To understand which image or text user response will be

higher

Ex. “Learn More” garnered 18.6 percent more signups per

visitor than the default of “Sign Up.” 

• Time Series Analysis: To understand Approval and disapproval trend

Page 8: DNC-Big Data and Data Mining in 2012 US Election

• Regression o Used to calculate Electoral votes(dependent variable) based on top

issues such as economy, healthcare etc.o Packages used were SAS, R and MATLAB

• Decision Treeso We don’t believe they used decision trees due to large number of

attributes which differ with each individual

Predictive Models

Page 9: DNC-Big Data and Data Mining in 2012 US Election

Reference

•Author: Michael Scherer ( November8, 2012). “How Obama's data crunchers helped him win” . Retrieved from http://www.cnn.com/2012/11/07/tech/web/obama-campaign-tech-team

•Author: Sasha Issenberg (December 19, 2012). “How President Obama’s campaign used big data to rally individual voters” . Retrieved from http://www.technologyreview.com/featuredstory/509026/how-obamas-team-used-big-data-to-rally-voters/