14

WANT TO SEE YOURSELF HERE?

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: WANT TO SEE YOURSELF HERE?
Page 2: WANT TO SEE YOURSELF HERE?

The Big Interview - p.13 We spoke to Ashok Srivastava about the impact of big data at NASA and on the wider business community, his presentation in Boston and data breakthroughs

Bridging the Gap - p.5 The current big data skills gap is growing, how can the industry and educators approach and overcome the problem?

Obama’s Big Data win - p.10 With Obama’s recent election victory, we discuss how big data and data mining was vital to his victory

A Chat with Drew Linzer - p.7Drew Linzer speaks to us about his use of data and analytics to predict the election result four months before it happened

CONTENTS

Big Data Innovation Keeping you up to date with all the latest Big Data Tech

WANT TO SEE YOURSELF HERE?We have several advertising opportunities available to cater to all budgets and can offer bespoke packages to suit you and your busi-ness.

Contact Pip at [email protected] for more information.

Welcome to this issue of Big Data Innovation.

As we approach the holidays I thought I would create a present for all of you making big data and analytics one of the most exciting industries to be involved with.

We hope you enjoy the content of this issue, we have a fantastic interview with Ashok Srivastava following on from the success of his impressive presentation in Boston and ahead of his highly anticipated presentation in San Francisco.

In addition to this we discuss how to bridge the ever growing data skills gap within the industry.

Drew Linzer also talks to us about his algorithms for the 2012 elections that saw him correctly predict the winning margins 4 months before the polls even opened.

We hope you enjoy this edition of Big Data innovation as much as we enjoyed making it.

GEORGE HILLCHIEF EDITOR

LETTER FROM THE EDITOR

HAVE AN IDEA?What would you like to see our writers covering or do you have any innovative Big data articles that you want to contribute?

For more information contact George at [email protected]

Page 3: WANT TO SEE YOURSELF HERE?

WHAT’S BIG IN BIG DATA?Welcome to this edition of Big Data Innovation where we will be exploring all things big data. In this issue we will be talking to the chief scientist at NASA, discussing the ways to bridge the big data skills gap and how Obama utilized big data during his election campaign.

Gil Press@GilPress

New #DataScience institutepromises to stimulate researchNY economy

http://smrt.io/SrrCIN

BIG DATAWORLD TOUR@IE_BigData

Big Data Set to Explode as 40 Billion New Devices Connect to Internet

40,000000,000

http://onforb.es/VQBo7p

What name would you give to a group of people working in Big Data?

Best Answers

Data WhisperersRock starsData Diva’sData DetectivesAfter a while, you can call them visually impaired!

minning all that data sure is hard on the eyes!

Follow Us at: @IE Big Data @iegroup

Gartner Predicts Big Things

Gartner has predicted that Big Data will have driven $28 billion of IT spend through 2012. Currently the biggest spend on these solutions is not actually in their use but in the adapta-tion of existing systems to handle it.

It is also coincides with companies looking to utilize social media and having a way of tracking the ways in which users are interacting with their

Obama’s Big Data Success

The recent election win by President Obama has been attributed to his thor-ough use of Big Data in order to target potential voters and make sure his core voters are voting.

With a data team five times the size of that in 2008 it shows that Big Data has become commonplace in successful election campaigns and bodes well for the future of the industry as a whole.

Data Expected to Create 6 million new jobs

According to a new study by the Gartner group, Big Data is expected to create around 4.4 million jobs in the tech in-dustry and thus 6 million across other industries too.

Around 1.9 million of these tech jobs are likely to be in the US, with the oth-ers spread out globally.

Page 4: WANT TO SEE YOURSELF HERE?

More business Insightthrough Be�er and Easier Data Analysis

Combine Big Data from multiple sources - put it at your fingertips using our interactive Trillion-Row Spreadsheet and get be�er answers in seconds.

Join hundreds of blue-chip companies who have found an easy way to get more insight

out of more data. For well over a decade our Cloud-based platform has pushed the limits

of analytics on large amounts of data. 1010data enables enterprises to combine,

analyze, and share any number of huge data sets across corporate boundaries, so that

new insights can inform strategy like never before.

From routine reporting to advanced analytics, the 1010data system allows business

like ours to easily hone your tactics and strategy, while reducing technology overhead,

costs, and risk.

www.1010data.com

Learn more about 1010data, download our Gartner reports here: 1010data.com/analyst-report-2

SM

Page 5: WANT TO SEE YOURSELF HERE?

Big Data is one of the quick-est growing industries in the

world and companies are clam-oring to get involved to reap the rewards of this new analytics rev-olution.

The new technique has many ad-vocates - from President Obama who has leveraged it famously during his last two election cam-paigns to Warren Buffet who made his first ever high tech in-vestment in IBM thanks to its new focus on big data.

One of the issues that this new industry currently faces is not that people do not necessarily un-derstand the end results, but that there is currently a real lack of qualified and experienced people to use the software and analyse to produce the results.

Gartner recently claimed that Big Data will create around 4.4 mil-lion IT jobs globally and is likely to surpass $3.7 trillion in compa-ny spend in 2013. These are huge-ly significant numbers in the con-text of the global economy and especially in countries that are still feeling the pinch from the re-cession.

The disappointing aspect of this report is that they also forecast that there will be a significant shortage of qualified profession-als to fill these potential positions, meaning that the investment that will be made may not reach its full potential. According to the report there is likely to be only one third of these IT jobs filled and an even smaller number will be available for the actual number of analysts needed.

So why, with this huge potential for global economic improve-ment, are we falling short of these targets?

There are two main reasons for this.

The current public and private schooling systems are not ade-quately preparing graduates.

Although universities are increas-ing the offerings in these areas there is still a significant gap in the numbers of courses offering the level of skills required to go straight from the classroom to the office.

The best professors are the peo-ple who have worked within the industry and know the shortfalls

that graduates will have and the ways to overcome them. It is great being able to build models and collect data, but in reality, with-out the analytical skills to back these up they are not enough.

“ Big Data could create 4.4 million IT

jobs globally ”

“ We can expect a 40-60 per cent projected annual growth in the

volume of data generated ”

THE BIG DATA SKILLS GAP

Page 6: WANT TO SEE YOURSELF HERE?

Although analytics have been around for a while, big data has taken these to a whole new level and graduates need a different set of skills to truly work. The industry has grown out of analytics, meaning that the guys who are using big data have only been doing it in its current form for 5 years at most. It is an exciting time to work in the area, so why would people move professions?

The industry is moving too quickly

Would you have heard of Hadoop 5 years ago? Perhaps if you were on the tip of the wave you would have heard the name, you may have had some practical experience of it. One thing that is for certain is that you would not have the kind of knowledge of it to create a university course on it.

People who are therefore graduating right now would not have had a course designed around the technologies they are likely to be using. It is the equivalent of starting a social media marketing course in 2009, learning everything there is to know about myspace and graduating to use Facebook and Twitter.

So what are the solutions?This question has been posed on several blogs and Linkedin groups and hundreds of people have made their points.

Three major points have come out of this:

Improve EducationYou may not know the ins and outs of hadoop and you may not know your quants from your java but if you come out of education with the basics of what you need to master these then you are likely

to be in a good position to excel in the future.

Placements and PatienceThe likelihood is that when people graduate they know the books, the words and the authors but have little knowledge of what is behind this knowledge. Employers need to have the patience to teach and nurture rather than expecting results straight away. Due to the speed that the industry is moving graduates should be seen as clay

that needs to be moulded.

Employing people based on potential rather than what they have on their CV is going to be vital. Like how the best carpenters and bricklayers learn their trade on the building site, the best analysts will learn their trade through being surrounded by experts and numbers.

PassionEinstein did not go to the best university, he was heavily dyslexic and could barely string together a sentence but he is now known as one of the most influential scientists in history. The reason for this is not purely down to talent and intelligence but also his passion and perseverance for the subject.

If we can find people who have the passion to succeed in analytics and have a genuine drive to succeed then they should be accepted with open arms.

CHRIS TOWERSBIG DATA LEADER

Big Data will drive $3.7 trillion in IT spend in 2013

Page 7: WANT TO SEE YOURSELF HERE?

Drew Linzer is the analyst who predicted the results of the

election four months in advance. His algorithms even correctly predicted the exact number of votes and the winning margin for Obama.

Drew has documented his analyti-cal processes in his blog votamatic.org which details the algorithms used, their selections and also the results.

He also appeared on multiple national and international news programs due to his results.

I caught up with him to discuss not only his analytical techniques but also his opinions on what was the biggest data driven election ever.

George: What kind of reaction has there been to your predic-tions?Drew: Most of the reaction has focused on the difference in ac-curacy between those of us who studied the public opinion polls, and the "gut feeling" predictions of popular pundits and commen-tators. On Election Day, data analysts like me, Nate Silver (New York Times FiveThirtyEight blog), Simon Jackman (Stanford Uni-versity and Huffington Post), and Sam Wang (Princeton Election

Consortium) all placed Obama's reelection chances at over 90%, and correctly foresaw 332 electoral votes for Obama as the most likely outcome. Meanwhile, pundits such as Karl Rove, George Will, and Steve Forbes said Romney was going to win -- and in some cases, easily. This has led to talk of a "victory for the quants" which I'm hopeful will carry through to future elections.

How do you evaluate the algo-rithm used in your predictions?My forecasting model estimated the state vote outcomes and the final electoral vote, on every day of the campaign, starting in June. I wanted the assessment of these forecasts to be as fair and objec-tive as possible -- and not leave me any wiggle room if they were wrong. So, about a month before the election, I posted on my web-site a set of eight evaluation cri-teria I would use once the results were known. As it turned out, the model worked perfectly. It predicted over the summer that Obama would win all of his 2008 states minus Indiana and North Carolina, and barely budged from that prediction even after support for Obama inched upward in Sep-tember, then dipped after the first presidential debate.

The amount of data used throughout this campaign both by independent analysts and campaign teams has been huge, what kind of implications does this have for data usage in 2016?The 2012 campaign proved that multiple, diverse sources of quantitative information could be managed, trusted, and applied

successfully towards a variety of ends. We outsiders were able to predict the election outcome far in advance. Inside the campaigns, there were enormous strides made in voter targeting, opinion tracking, fundraising, and voter turnout. Now that we know these methods can work, I think there's no going back. I expect reporters and campaign commentators to take survey aggregation much more seriously in 2016. And al-though Obama and the Democrats currently appear to hold an ad-vantage in campaign technology, I would be surprised if the Republi-cans didn't quickly catch up.

Do you think that the success of this data driven campaign has meant that campaign managers now need to be an analyst as well as a strategist?The campaign managers may not need to be analysts themselves, but they should have a greater appreci-ation for how data and technology can be harnessed to their advan-tage. Campaigns have always used survey research to formulate strategy and measure voter senti-ment. But now there are a range of other powerful tools available: social networking

“ I wanted the assess-ment of these forecasts to be as fair and objec-tive as possible and not

leave me any wiggle room if they

were wrong ”

A CHAT WITH DREW LINZER

“The 2012 campaign proved

that multiple, diverse sources of

quantitative information could

be managed, trusted”

Page 8: WANT TO SEE YOURSELF HERE?

websites, voter databases, mobile smartphones, and email marketing, to name only a few. And that is in addition to recent advances in polling methodologies and statistical opinion modeling. There is a lot of innovation happening in American campaign politics right now.

You managed to predict the election results 4 months beforehand, what do you think is the realistic maximum timeframe to accurately predict a result using your analytics techniques?About four or five months is about as far back as the science lets us go right now; and that's even pushing it a bit. Prior to that, the polls just aren't sufficiently informative

about the eventual outcome: too many people are either undecided or haven't started paying attention to the campaign. The historical economic and political factors that have been shown to correlate with election outcomes also start to lose their predictive power once we get beyond the roughly 4-5 month range. Fortunately, that still gives the campaigns plenty of time to plot strategy and make decisions about how to allocate their resources.

If you are interested in hearing more from Drew you can check out his blog at votamatic.org or attend his presentation at Big Data Innovation Summit in San Francisco on April 11 & 12.

GEORGE HILLCHIEF EDITOR

Do you want to write for Big Data Innovation? We are always looking for new innovative articles or new ideas for our writers to cover.

Contact us at [email protected] for more information

Do you want to have your company represented in the magazine? We have multiple advertisement opportunities to cater to all budgets.

Contact Pip at [email protected] for more information

Want to get involved?

Page 9: WANT TO SEE YOURSELF HERE?
Page 10: WANT TO SEE YOURSELF HERE?

In October 2012 US political commentators were claiming

that the election race between Obama and Romney was going to be one of the tightest contests in living memory. Republicans were claiming a landslide victory and Democrats were constantly looking over their shoulders wondering how things had got this close.

In July 2012 Drew Linzer had predicted a 332-206 win for Obama. When asked if he had changed his views on this after Romneys gaffes and Obama’s poor first debate showing, he did not budge on these numbers and turned out to be totally correct on the morning of November 7.

This shows the power that algorithms and numbers have had in recent elections. Many attributed the power of evangelists to George Bush’s two consecutive elections, few can doubt the power of data in Obama’s.

Before the true campaigning had even begun, the Obama campaign team led by Jim Messina had amassed a data team five times

the size of that in 2008 and were using data to fundraise in a way that had not been done before.

Emails went out inviting people to join a competition to win a dinner with Sarah Jessica-Parker. When people compared the emails they realised that there were several different variants and these variants created a much wider engagement rate with individuals, increasing the amount of money given.

In 2008 much was made of the attempts from the Obama team to utilize data mining within their campaign. However, the data mining consisted of multiple databases with little or no interaction with one another. The first thing that the new data team built was a database pulling in all of this information into a single usable source.

This source allowed the team to have upto 80 separate information points for individuals, meaning that the previous groupings done through gender, religion or ethnic group became far less relevant, as with these pieces of information

the team could drill down and pin point individual needs.

The levels of information available to Obama’s team saw significant reductions in advertising spend whilst also increasing the interaction with their audience. Traditional TV advertisements before and after local news programs were not targeting all of their audience, so they put ads between shows popular with swing voters and the correct demographics. The information for all of these being available through the new integrated database.

Not only could they use this data for analysis on what they were doing, but it also helped them to build algorithms to help predict the likely actions of particular

individuals. The use of predictive analytics is the main reason that Drew Linzer managed to predict the outcome of the election 5 months prior to the actual result and with the campaign team running 66,000 scenarios each night, they had significant insight.

Long gone were the old school ways of running campaigns based on assumptions and gut feelings, Obama’s team rarely ran with an idea without numbers and models to back up their actions. This led to strategic targeting and resource allocation which saw significant cuts in the the numbers of ineffective man hours.

OBAMA’S BIG DATA ELECTION WIN

Obama’s data team was five times larger

than in 2008

Page 11: WANT TO SEE YOURSELF HERE?

This campaign was the first to have a dedicated Chief Data Scientist, Rayid Ghani, who managed all of the data and the data scientist team. This meant that objectives and priorities were well directed towards particular outcomes, further increasing the efficiency of labour assignment.

There were significant breakthroughs that allowed the campaign to target people in a way that had never been done before. For instance through complex analytics in Chicago, it was found that people who signed up for the campaign’s Quick Donate programme were likely to donate around 4 times more than donors giving through traditional

means. One of the key metrics collected and measured was the ‘persuadeability’ of voters, these could be potential supporters who needed persuading to vote, swing voters to persuade to support Obama or supporters who could be persuaded to make either a financial or voluntary contribution. This metric alone saw a 14% increase in the effectiveness of targeted advertising. Aiming for these people with specific messages in areas where they were more likely to interact with them meant that the effectiveness of the message was increased significantly.

The difference between the two parties was best described by Mike Lynch in his article in Computerworld:

“For those who had the stamina to watch the election campaign

unfold over 22 long months, it became not just a battle of ideologies and campaign issues, but also a rivalry between old media pundits and new media analysts”

The understanding of new media and the analytical implications of this become clear when comparing the use of social media in both campaigns. Taking Youtube as an example, Obama had 240,000 subscribers and 246 million pageviews compared to Romney’s 23,700 subscribers and 26 million pageviews.

This increase in social media interaction not only meant that Obama’s message was going out to 10 times more people through this medium, but also that his team were getting 10 times more data about their audience. This further increased the personalised messages and so further increased the numbers of responders.

So although the media made endless speculation about the numbers of voters and the effect that gaffes, fluffs and poor debating made, in reality the outcome of the election had been predicted 5 months beforehand thanks to the kind of analytics that ended up winning Obama the election.

So where will things go from here?

With the well publicized use of analytics within this campaign there will be an inevitable trend towards using them more thoroughly in 2016 and beyond. Obama had the advantage of not having to go through the primaries this time around meaning that his data team had significantly longer to build the algorithms and databases needed

to piece together a complex data driven campaign.

The republicans had 7 months, meaning that they had little time to prepare to the same extent as the democrats, but in 4 years time when there will be primaries for both parties, this will create the potential for a truly data driven race.

The question will be how this will effect the campaigns overall and what kinds of information will be available to government agencies in 4 years time with new legislation on data privacy being consistently passed.

The main lesson we can learn from this campaign is that big data is here and based on this evidence, will be around for a while.

14% improvement in ad effectiveness

DAVID BARTONANALYTICS LEADER

Page 12: WANT TO SEE YOURSELF HERE?

Big Data Innovation Summit

April 11 & 12, 2013Westin St Francis

San Francisco

Snapshot Agenda Please check back regularly to see the latest additions to the Big Data Innovation Summit. Please note that this is a snapshot agenda, there are more confirmed speakers please see

The Big Data Innovation Summit is the largest gathering of Fortune 500 business executives leading Big Data initiatives. There are six tracks included at the Big Data Innovation Summit:

• Cross Industry• Healthcare• Finance• Government• Hadoop• Data Visualization (optional bolt-on)• Women in Big Data

Chief Technology OfficerDept. of Health

Division DirectorDept. of Defense

Chief of InformationDept.of Commerce

Principal ScientistNASA

Director, DatabaseDevelopment

San Fran Police Dept.

Chief Data OfficerNYSE

Director, Decision ScienceBarclaycard

VP, IT & Risk Analytics Deutsche Bank

VP, Digital StrategyWells Fargo

Chief Medical OfficerGE Healthcare

Chief Data OfficerSeattle Childrens

Sr. Director, ClinicalOutcomes & Analytics

Walgreens

Sr VP, Big DataComcast

Principal Data ScientisteHarmony

Data Science & Analytics,Facebook

Director, Decision ScienceBarclaycard

Principal ScientistNASA

Sr. Director,BI/DW Strategy

Walgreems

Principal Data ScientisteBay

Professor, PoliticalSciences

General Electric

Data Visualization GuruFacebook

Data ScientistPinterest

Senior Data ScientistLinkedin

Data VisualizationScientistTwitter

DataVisualization

Big Data inHealthcare

Big Data inFinance

Big Data inGovernment

Big DataInnovation

HadoopInnovation

08.30TimeTrack 09.00 09.30 10.3010.00

For speaking opportunities please contact Chris Towers on [email protected] or +1 415 992 5339For sponsorship opportunities please contact Pip Curtis on [email protected] or +1 415 992 5349To attend as a delegate please contact Robert Shanley on [email protected] or +1 415 992 7605

For full schedule please scan the QR Code

theinnovationenterprise.com/events/big-data-innovation-summit-april-2013-san-francisco

Page 13: WANT TO SEE YOURSELF HERE?

NASA’S BIG DATA:THE INTERVIEW

I spoke to Ashok Srivastava about NASA, big data and his presentation at the Big Data Innovation Summit in San Francisco in April

On a sunny September morning in Boston this year, Ashok Srivastava was waiting to stand at the

podium and present to a room of 600 people at the Big Data Innovation summit - the largest, dedicated big data conference in the world. Giving his perspectives on the growth of big data, its uses in aviation safety and how his employer, NASA, have utilised and innovated through its uses, Ashok came out as one of the most popular speakers at the summit.

Page 14: WANT TO SEE YOURSELF HERE?

After the success of the presentation and the summit in general, I was lucky enough to sit down with Ashok to discuss the way that big data has changed within NASA and the success that it has had in the wider business community. So why has big data come to prominence in the last three or four years?Ashok argues that this is not a change that has taken place solely over the past three or four years. It is a reaction to society’s change in general to becoming more data driven. The last 25 years have seen people increasingly needing data to either make or back up decisions. Recent advancements in technology and the ability of people to analyze large data sets have meant that there has been an acceleration in the speed at which this happens. With new types of databases and the ability to record and analyze data quickly the levels of technology required have been reached. NASA has been at the forefront of technology innovation for the past 50 years, bringing us everything from the modern computer to instant coffee. Ashok explains how NASA is still innovating today and with the huge amounts of data that they are consuming, what is happening in big data there is going to effect

businesses. For instance NASA are currently discussing the use of their big data algorithms and systems with companies ranging from medical specialists to CPG organizations. The work that they have done within data in the past few years have created the foundations allowing many companies to become successful.One of the issues that is really effecting companies looking to adopt big data is the current gap in skilled big data professionals. The way to solve this in Ashok’s opinion is through a different set of teaching parameters. The training for these should revolve around machine learning and optimization, allowing people to learn the “trade of big data” meaning that they can learn how systems work from the basics upwards, allowing them to have full insight when analyzing. Given the relative youth of big data, I wanted to know what Ashok thought would happen with big data at NASA in the next 10 years in addition to the wider business community.NASA in ten years will be dealing with a huge amount of data, on a scale that is currently unimaginable. This could include things like full global observations as well as universe observations, gathering and analyzing petabytes of information within seconds.

With public money being spent on these big data projects, Ashok makes it clear that the key benefit should always boil down to ultimately providing value for the public. This is a refreshing view of NASA who have traditionally been seen as secretive due to the highly confidential nature of their operations and the lack of public understanding. Ashok also had some pieces of advice for people currently looking to make waves in the big data world: “It is important to understand the business problem that is being solved”“Making sure the technologies that are being deployed are scalable and efficient”Ashok will be presenting at the 2013 Big Data Innovation Summit in San Francisco on April 11 & 12.

“This is a change in culture that’s happened

not in the last 3 or 4 years but probably

over the last 25 years”

“It is important to un-derstand the business problem that is being

solved” GEORGE HILLCHIEF EDITOR