27
1

DATA LAUNCH PAD We look at how pricing is allowing data to

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: DATA LAUNCH PAD We look at how pricing is allowing data to

1

DATA LAUNCH PADWe look at how pricing is allowing

data to launch in smaller companies

Page 2: DATA LAUNCH PAD We look at how pricing is allowing data to

2

Welcome to this issue of Big Data Innovation.

We appreciate all of the emails since the last issue showing your appreciation for the magazines and the kind of content that we are producing.

In this issue we are discussing everything from the state of big data education through to the recent Gartner report and its effect on the industry.

This month we have also seen the NSA and GCHQ data scandal hitting headlines again. Senior members of both the US and UK governments have come out to express their disappointment at how information has been collected. This could well be the wake up call that the rate of technology development could have negative results in the long term if it is not correctly policed.

Companies have a role to play in this, making sure that the ways in which their data is collected is both ethical and transparent. People are increasingly worried about the ways in which their data is being collected and used, transparency is the way to alleviate these fears.

This may not always be possible, but if there is a backlash against data collection then this will not only effect the ways in which companies are using their data, but how society views big data in general.

It is up to those using these technologies to make sure that their data collection is ethical and that the individuals who's information is held are aware of the benefits.

We hope that through publications like this, that best practices can be shared and we can make sure that Big Data grows in to the game changer that we all know it can be.

As always, if you are interested in advertising or writing for the magazine, contact me at [email protected]

George HillChief Editor

§

Managing EditorGeorge Hill

Assistant EditorsJoanna GiddingsChris Towers

PresidentJosie King

Art DirectorGavin Bailey

AdvertisingHannah [email protected]

ContributorsDavid BartonChris TowersTom DeutschHeather JamesClaire Walmsley

General [email protected]

Letter From The Editor

Page 3: DATA LAUNCH PAD We look at how pricing is allowing data to

3

4 David Barton looks at how Pamela Peele, Kirk Borne and Gregory Shapiro view big data education

8 Chris Towers looks at how companies are bridging the big data skills gap

11 We look at how pricing is allowing big data to launch in smaller and smaller businesses

16 Heather James interviews Stephen Wolfram, the mind behind Wolfram Alpha and the mathematica language

Contents

20 Claire Walmsley talks about how making data’s usage more transparent will help the industry as a whole

24 Tom Deutsch discusses the importance of baking data into your products

S

Page 4: DATA LAUNCH PAD We look at how pricing is allowing data to

4

In the famous words of Tony Blair when setting out the most important aspects of how he wanted to run the UK - “Education, Education, Education”.

We are seeing that with the increasing numbers of companies now looking at implementing big data, that one of the most important aspects that will allow this to flow seamlessly is through big data education and the effective use of skilled labour.

Due to the complexities involved in the education pro-cess and the incredibly speed in which the industry is moving, there have been question marks around how effective this currently is.

I spoke to three of the industry's leading big data ex-perts about their thoughts on the current state of big data education and how it could be improved.

Big Data Education

David Barton Big Data Leader

4

Page 5: DATA LAUNCH PAD We look at how pricing is allowing data to

5

Kirk's view on this is that there are two perspectives that need to be looked at in order to effectively as-sess current big data education initiatives. 'The phrase that I use with people is that it's an edu-cation in data as well as data in education'Data in Education: One of the things that Kirk be-lieves is that from a young age data should be in-cluded heavily in education, as regardless of your future profession, it will be used in one way or an-other. For instance it can even be done at kindergar-ten level, the ways in which toys are sorted by colour, type, size or shape are all forms of data siloing. Us-ing this kind of technique early where children can identify and explain why certain things are in certain areas forms a strong foundation to add more com-plex ideas on. Education in Data: This initial education throughout earlier school opportunities will also allow the edu-cation in data aspect to be more thorough and suc-cessful. What many lecturers currently find is that people come into higher data education with a gap in understanding, with some teachers actually say-ing that students don't know what 'data' is. The need to teach people these aspects of data throughout their lives will be vital to improving edu-cation and closing the skills gap. Many, when looking to data for business solutions want to find an all encompassing data scientist. Kirk believes that this is not always necessary however. A business team is like any other team, you have dif-ferent people in it to do different jobs. Kirk believes that companies who are looking for the complete package data scientists can avoid doing this by looking at this concept. Sure there are 'all star data scientists' around, the ones who know about the al-gorithms with the business, sales, strategy, finance and can run almost as a department in themselves, but they are like all stars everywhere else. Rare.

Kirk BorneProfessor of Astrophysics & Computational ScienceGeorge Mason University

5

Page 6: DATA LAUNCH PAD We look at how pricing is allowing data to

6

"There will be a shortage of talent necessary for organizations to take advantage of big data. By 2018, the United States alone could face a short-age of 140,000 to 190,000 people with deep an-alytical skills as well as 1.5 million managers and analysts with the know-how to use the analysis of big data to make effective decisions."

The report predicts that this kind of skills gap will exist in 2017, but Gregory believes that we already seeing this. Whilst using Indeed.com to look at what expertise companies are looking for, Grego-ry found that of the top 10 job trends both Mongo DB and Hadoop appear.

"Big Data is actually rising faster than any of them. This indicates that demand for Big Data skills exceeds the supply. My experience with KD-nuggets jobs board confirms it - many companies are finding it hard to get enough candidates."

There are people responding to this however, with many universities and colleges recognising not only the shortages, but also the desire from peo-ple to learn. Companies looking to expand their data teams are also looking at both internal and external training.

For instance companies such as EMC and IBM are training their data scientists internally. Not only does this mean that they know that they are getting a high quality of training, but that the data scientists that they are employing are being educated in 'their ways'.

Gregory Piatetsky-ShapiroAnalytics/Data Mining Expert, EditorKDnuggets

6

Page 7: DATA LAUNCH PAD We look at how pricing is allowing data to

7

Whilst other industries such as finance, insurance and retail are also feeling the pinch in terms of the numbers of qualified and experienced data scien-tists, healthcare has been hit even harder. The reason for this, according to Pamela, is "in healthcare whilst it is somewhat transactional delivering ser-vices, the service isn't exactly the same because the consumption and action of service varies by patient so its much harder to deal with health data than transaction pieces which are claims or transaction data."

Of course, the only real way around this is through the ways in which we are educating graduates. Pamela believes that at PHD level, the graduates that come through the system are good, however at bachelor degree level, there could be some improvement.

However, this may be a changing trend as in the US especially we are seeing universities making investments in their big data, analytics and statistical courses. This will hopefully see an improvement in the quality of their statis-tical bachelor degree graduates.

One of the ways in which healthcare companies have tried in the past to make up for the dearth of healthcare centric analytical talent is transforma-tion. This is done through adapting either technical thinkers to healthcare or healthcare thinkers to become more technical. The issues that this creates is a bias towards one side of a role that should be balanced.

Pamela Bonifay PeeleChief Analytics OfficerUPMC

7

Page 8: DATA LAUNCH PAD We look at how pricing is allowing data to

8

The big data buzz over the past two years has created a thirst for technical skills amongst thousands of companies. The success of early adopters such as Google and Facebook, who's primary income drivers de-rive from big data, has caused business leaders to sit up and take notice.

Bridging the Big Data Skills GapChris TowersAssistant Editor

8

Page 9: DATA LAUNCH PAD We look at how pricing is allowing data to

9

The demand has pushed up the potential salaries of those currently working in the area, whilst also creating a gap in the supply of data skills re-quired across the wider busi-ness community.

So with companies now look-ing at big data implementa-tion, how can they bridge the skills gap without compromis-ing the quality of their analy-sis?

Building a team

When discussing with those in the industry, the question I often pose is what skills are needed to really succeed in data science? In addition to technical knowledge around Hadoop, stacks, SQL and other technologies, most say business understanding.

The idea behind this is that a data scientist should be more than just a data user, they should be a true catalyst to business change with the business knowledge to imple-ment these new ideas across the corporation.

Although this may be ideal, finding somebody like this is almost impossible. All people will have strengths & weak-nesses and data scientists are no different. They may be brilliant at data mining and analysis but may be weaker on communicating finding.

Therefore companies should be looking to create a data science team. By identifying what you need to use the data for and who will be the likely recipients, those with neces-sary individual skills can be brought in to create function-al teams. I know of companies who employ journalists within their data team to help com-municate findings and busi-ness consultants to stream-line integrations. Data science teams need to be viewed like factories, where to get the end product you need to have several different aspects put-ting it together.

Changing places

There is a crossover between many current roles that exist in organisations and the skills needed to become an effec-tive data scientist. The prime example of this would be web developers.

Although CSS and HTML do seem like relatively basic cod-ing languages, in reality the crossover between these and manipulation of data is strong. The creation of stacks is es-sentially code manipulation, something that they do within their current roles. Due to this with some additional training they could technically start a data science program.

By plucking these people from

9

Page 10: DATA LAUNCH PAD We look at how pricing is allowing data to

10

their current roles and train-ing them through external companies such as Cloudera, there is likely to be not only the technological understand-ing but also a wider business knowledge.

External companies

Sometimes companies over invest in aspects of their busi-ness which in reality can be outsourced. The same is true of data management and analysis.

Of course outsourcing is not always possible, due to confi-dentiality and legal issues sur-rounding some data. However with the majority this can sim-ply be outsourced to a com-pany who have the experts there already. Letting anoth-er company do the leg work for your data makes perfect sense. Having experts work-ing on your data who aren't on your payroll also means that you do not need to try to find a qualified candidate.

As a prime example of the difficulties around this, Kirk Borne, one of the early pio-neers of modern big data, says that the roles have now been reversed in the past 10 years. A decade ago there would be one job for one hundred rele-vant graduates, today there are one hundred jobs for one graduate. Avoiding the time and money spent on recruiting in such a competitive market allows these to be reinvested in implementing the findings from the outsourced data.

The skills gap is something that everybody in the industry is well aware of and until we have the number of graduates to match the number of jobs, it will be an issue. The gap might grow or shrink, but at the mo-ment companies need to find ways to avoid falling into it.

Page 11: DATA LAUNCH PAD We look at how pricing is allowing data to

11

A recently released Gartner survey claimed that 65% of companies are undergoing some kind of big data initi-ative in 2013. This is an im-pressive number consider-ing that three years ago big data was almost unheard of outside of the analytics community.

How Pricing is Allowing Big Data to LaunchGeorge Hill Chief Editor

11

Page 12: DATA LAUNCH PAD We look at how pricing is allowing data to

12

Given that there is now a large number of companies who are using or considering big data, why are they adopting it now? Is it the hype? Is it the increased numbers of candi-dates with the correct skills?

The simple answer is the price.

Only a few years ago, the technology needed to proper-ly analyse a terabyte of data was very expensive. The pro-cessing speed needed within a system would have cost hun-dreds of thousands of dollars, making it unaffordable for all but the largest companies.

In 2013 this figure has dropped as low at $25,000 for the same technology.

Complex and potentially vola-tile databases can now be run through in-memory processes as a result of this price drop. This is the process in which a database is stored through the constant use of RAM as opposed to storage on a

hard-drive. The difference be-tween these two forms of da-tabase is astounding, as sev-eral steps within the loading and processing are skipped. This means that information stored through in-memory data can be utilised as much as 450 times faster than data held in a traditional database.

This use of cheaper systems to implement this power-ful analytics technique, has meant that even startups can realistically utilise the system in order to run big data pro-grammes.

So combined with the technol-ogy prices falling dramatical-ly, we have also seen the in-creased use of free and open source software increase.

The community aspect of pro-grammes like Hadoop has not only meant that thousands of people can help to improve the product daily, but also means that it can stay free.

12

Page 13: DATA LAUNCH PAD We look at how pricing is allowing data to

13

Everybody is using Hadoop or similar systems, not necessar-ily because it is free, but be-cause it is one of the best. The ability to have one of the top performing softwares availa-ble for no cost, combined with the cheap technology that is now available has made big data accessible to companies, making programmes a feasi-ble idea.

When we are looking at these products however, we are not factoring in one crucial com-ponent. This is that in order to effectively utilise a big data system, analysts and data scientists are needed.

In reality, having good soft-ware and good hardware can get a company to a certain point, but to create truly ac-tionable initiatives from their data, companies need to be able to drill down and notice patterns. This is something that only a person who is edu-cated and experienced would really be able to do.

So how are companies getting around this?

Through the cloud.

The reduction in price means that it isn’t only large compa-nies who can create their own big data programmes, but start-ups can now be formed and realistically create these kinds of technologies them-selves in order to service oth-ers. With skilled and entre-preneurial people having the ability to use these technolo-gies, combined with the abil-ity to use the cloud for data transfer, these skills can be truly shared. This is why com-panies such as Qubole, can be used to not only work through data analysis, but can do so with better technology and with a faster turn-around.

Bandwidth is widening each year and the decreasing price of super fast broadband has meant that outsourcing big data initiatives is often now the cheaper option, despite the decrease in price for doing it in house.

This has created a situation where companies have the ability to bridge the big data skills gap without the pressure

13

Page 14: DATA LAUNCH PAD We look at how pricing is allowing data to

14

to bring a full time analyst on to the payroll. Due to the de-crease in price and the abili-ty for smaller companies to start using this technologies, companies can now go to out-sourced and qualified data scientists.

This kind of change within the big data system is huge and has the potential to revolu-tionise the way that compa-nies use their big data pro-grammes.

With the big data skills gap potentially throwing spanners in the works for many compa-nies, this fast technology com-bined with super fast broad-band will allow data analysis to be outsourced, truly creat-ing an environment where big data companies can exist for

the sake of big data.

This decrease in technology price will have a profound ef-fect on the industry, and with prices looking to decrease even further in the next few years could spark even more change.

Suddenly big data can be outsourced to truly quali-fied and driven professionals, which will see the exponen-tial growth curve continue to grow.

Page 15: DATA LAUNCH PAD We look at how pricing is allowing data to

15

FOLLOW US

BIT.LY/BIGDATASIGNUP

@IE_BIGDATA

SUBSCRIBE

Page 16: DATA LAUNCH PAD We look at how pricing is allowing data to

16

At the Big Data Innovation Summit, Boston in September 2013, Stephen Wolfram took the stage to deliver a presentation that many have described as the best amongst the hundreds that took place over the 2 day event.

Discussing his use of data

Big Data Innovation with Stephen WolframHeather James Big Data Innovation Summit Curator

Page 17: DATA LAUNCH PAD We look at how pricing is allowing data to

17

and the way that his Wolfram Alpha programme and Math-ematica language are chang-ing the ways that machines utilise data, the audience was enthralled.

I had initially organised to sit down with Stephen immedi-ately following his presenta-tion, but I was forced to wait for several hours due to the crowds surrounding him as soon as he finished. The 20 people surrounding him for an hour after his presentation were testament to Stephens achievements in the past 25 years. Having spoken to oth-ers around the conference the most common adjective was 'brilliant'.

During the afternoon I did manage to sit down with Ste-phen. What I found was a down to earth, eloquent man with a genuine passion for data and the way that we are using it as a society.

Stephen is the CEO and founder of Wolfram Alpha, a computational knowledge engine designed to answer questions using data rather than suggesting results like a traditional search engine such as Google or Bing. Wolfram Alpha is the product of Ste-phen's ultimate goal, to make

all knowledge computation-al, being able to answer and rationalise natural language questions into data driven an-swers.

He describes it as 'A major democratisation of access to knowledge', allowing people the opportunity to answer questions that previously would have required a signif-icant amount of data and ex-pert knowledge. According to Stephen the product is al-ready been used everywhere from education to big busi-ness, it is a product on the up.

Many will claim that they have never used the system, how-ever anybody who has asked Apple's SIRI system on the iP-hone a question will have un-wittingly experienced it. Along with Bing and Google, Wolf-ram Alpha powers the SIRI platform, enabling users to ask questions in standard lan-guage and translate this into data driven answers.

What Wolfram Alpha is doing differently to everybody else at the moment is taking pub-licly held knowledge and using it to answer questions rather than simply showing people how to find the information. It allows users to ingest the information that others have

Page 18: DATA LAUNCH PAD We look at how pricing is allowing data to

18

found to find interesting and deep answers.

Stephen has a real passion for data, through not only Wolfram Alpha and his mis-sion to computate knowledge, but also on a personal level. He is the human who has the record for holding the most data about himself. He has been measuring this for the past 25 years and he can see this becoming more and more popular in wider society, with wearable personal measure-ment technologies become increasingly popular.

This change in the mindset of society as a whole to a more data driven and accepting so-ciety is what Stephen believes to be the key component to Wolfram Alpha now becoming what it is. Stephen says that he always knew that there would come a time when soci-ety had created enough data to be able to make Wolfram Alpha viable and that time is now.

This is testament to how far we have come as an industry that we can now power something like Wolfram Alpha through the amount of data that we have now recorded. It is a real milestone in the development of a data driven society.

The reason for this according to Stephen is that many of the key data sources haven't been

around for a long time, things like social media and machine data has allowed this shift to occur. He only sees this trend continuing with increasing amounts of machine driven sensors collecting data.

With the use of data at Wolf-ram Alpha now hitting an all time high, I was curious about where Stephen thought big data would be in 5 years time. He believes that the upward curve will only continue, per-sonal analytics will become part of a daily routine and this will only see the amount of data increase.

He also sees the use of sci-ence and mechanics having a profound effect on the ways in which companies utilise their data. We will see anal-ysis looking at more than just numbers, but also putting giv-ing these numbers meaning through scientific principles.

Overall, what I have learnt from talking to Stephen is that data is the future in more than just a business context. Soft-ware that allows people to mine data without realising they are even doing it will be important to development of how we use information.

Wolfram Alpha is changing the data landscape and with the passion and genius of Stephen Wolfram behind it, who knows how far it could go.

Page 19: DATA LAUNCH PAD We look at how pricing is allowing data to

19

On-DemandBusiness Education

www.membership.theiegroup.com

Page 20: DATA LAUNCH PAD We look at how pricing is allowing data to

20

Recently companies have received a bad reputation about how they are hold-ing individual information. There have been count-less data leaks, hackers exposing personal details and exploitation of indi-vidual data for criminal

Data TransparencyClaire Walmsley Big Data Expert

Page 21: DATA LAUNCH PAD We look at how pricing is allowing data to

21

activities.

The world's press has had it's attention drawn towards data protection and individ-ual data collection through the NSA and GCHQ spying scandal. Society in general is becoming more aware of the power that their data holds and this combined with the in-creased media attention, has led to consumers becoming more data savvy.

Companies like Facebook and Google have made billions of dollars through their effi-cient use of data and are now looked at warily by many. Al-

though major data secrecy violations are yet to occur at either organisation, the reality is that people know that data is held about them and need to trust the company who is keeping it.

So how can companies be-come more trustworthy with their customer data?

One of the keys to success within a customer base is trust and the best way to gain this is through transparency. Allowing people to see what

Page 22: DATA LAUNCH PAD We look at how pricing is allowing data to

22

kind of information that they have held on them by any par-ticular company creates trust. By outlining exactly what is held on people will create an understanding of what the in-formation is used for.

A sure fire way to lose trust is through the 'if you don’t ask you don’t get’ use of data col-lection visibility. This is the idea that when reading complex or overly long agreements the data protection aspects are available, but not implicitly stated. In reality this is much of what has happened in sev-eral cases, with information management details being buried in small prints, so al-though technically accessible are in reality not effectively communicated.

The best way to circumnav-igate this is to make it clear, send an email, have a sep-arate section or even a blog that is outlining how data is being used and why. It is very seldom that people are hav-ing their data used in manip-ulative or sinister ways, mak-ing them aware of how their data is improving their expe-riences will make an audience far more receptive to it being used.

At the moment there are ways that you can check on certain

elements of how your data is being used. Using a goog-le account you can see what Google has matched to your here: google.com/dashboard/

This allows you to see who Google presumes you are based on your browsing histo-ry and what ads are therefore targeted towards you. It is often interesting to see what your actions online say about you. This detail is a move in the right direction for compa-nies but still has an enigmat-ic feeling that there isn't total transparency.

With the pressures of data protection surrounding most companies today, this kind of move would allay many of the fears that consumers cur-rently have when their data integrity is in question.

What the industry needs to-day is consumer trust and transparency is one of the key components to achieving this.

22

Page 23: DATA LAUNCH PAD We look at how pricing is allowing data to

23 Datagility for the millennial enterprise

Virtusa offers data expertise, through its complete gamut of analytics solutions, and caters to dynamic decisioning needs

• Claims analytics• Customer analytics• Healthcare analytics• Structured and web content convergence analytics• Big Data analytics• Social media, mobile and cloud analytics

Talk to our experts at Booth #3

www.virtusa.com

Page 24: DATA LAUNCH PAD We look at how pricing is allowing data to

24

There was a good article on Gi-gaom recently that I though de-served some additional attention here. The article was focused on building data science into your product offerings rather than trying to “bolt” on the data sci-ence aspects afterwards. The key take away from the article was this:

Baking Data into Your Core ProductTom DeutschBig Data Solution ArchitectureIBM

Page 25: DATA LAUNCH PAD We look at how pricing is allowing data to

25

For startups, data science should not be seen as a separate scientific initia-tive but as an integrated part of the product. Speed and efficiency are key fac-tors to burgeoning com-panies; hiring and building out a team of data scien-tists, or more aptly named “data product engineers,” is paramount. Once you accept that data science is about building data prod-ucts, you will see that your data engineers, contrary to popular belief, do not need PhDs. Instead, they need to be able to inte-grate into the core of your product and engineering organization.

For those that know me from my monthly column at IBM Data Magazine it probably won’t be shock-ing that I am going to argue

that this notion isn’t only valid for startups. In fact I find the notion that the advice (which to be clear is good advice) is somehow typed to startups pretty goofy. Baking analytics into all your products should be something all firms do – full stop. So how do you actually do that? Well it starts with rethinking what the product actually is.

The tendency for most firms is to think of a prod-uct as a fixed thing where tailoring to a user is done in segments and only on the edges of the product. Think of your typical web page; 90% of it is complete-ly standard and the parts that change are large-ly generic. That is legacy thinking and is not going to keep your customers en-gaged. Instead think of the

Page 26: DATA LAUNCH PAD We look at how pricing is allowing data to

26

product as a variable thing driven by user interactions in a segment that is designed to change, designed to flex in real-time based on the analytics and data science that is built into the product. The user needs to shape the experience and the content of the product as they inter-act with it – it needs to be contextual, relevant and as unique to the person doing the interacting.

Now some of you may be wondering at this point how a product can do that, and of course it can’t unless you ex-tend the notion of the prod-uct to include the underlying platforms that support it. This is a key point – histor-ically we’ve built products and they simply ran on top of a platform. Going forward the platform capabilities are a core part of the product, and that means exploiting a ‘Fit For Purpose’ approach to architecture (link to article

please) so you are from day one thinking about how the right dynamic experience is built from the ground up. This approach will surface data and analytics needs that can run in Customer Time (link to article please). It has a notion of a closed loop analytics process where in-teractions are recorded, ex-perience tweaked, interac-tions are recorded and rise wash repeat. This approach will built experimentation and A/B testing into the core design. We’ll pick this up in more detail in a future post. Until then thanks for the ide-as and comments.

Page 27: DATA LAUNCH PAD We look at how pricing is allowing data to

27