23
Making Big Data work Lewis Crawford Principal Architect @ the DataShed thedatashed.co.uk [email protected] © the DataShed Limited 2015

Making big data work

Embed Size (px)

Citation preview

Page 1: Making big data work

Making Big Data workLewis CrawfordPrincipal Architect @ the [email protected]

© the DataShed Limited 2015

Page 2: Making big data work

intro

Page 3: Making big data work

Who am I?• For the last 3 years, the DataShed has been providing consultancy services to a vast array

of large clients. Our primary focus is ensuring that technology and analytical strategies are truly aligned so that businesses can leverage the latest and greatest in technology to model, mine and describe their data asset.

• We were working with Big Data technology before the term was coined, we have experience delivering analytical systems driven by Petabyte data sets, and have designed, implemented and supported one of the largest real-time data integration and predictive analytics platforms in the aviation world.

• Our model is based on using a small number of exceptionally highly skilled individuals to deliver disruptive and innovative solutions in an agile and delivery-focused manner.

© the DataShed Limited 2015

Page 4: Making big data work

So what is ‘Big Data’?

© the DataShed Limited 2015

Page 5: Making big data work
Page 6: Making big data work

Why do Big Data projects fail?

Too many people think that Big Data is:

“The belief that the more data you have, the more insights and answers will rise automatically from the pool of ones and zeros.”

Gill Press, Forbes.com

© the DataShed Limited 2015

Page 7: Making big data work

How to make Big Data work?

1. Understand your problem

2. Apply appropriate tools

3. Automate everything.

© the DataShed Limited 2015

Page 8: Making big data work

Real-time data

© the DataShed Limited 2015

Page 9: Making big data work

© the DataShed Limited 2015

Page 10: Making big data work
Page 11: Making big data work

© the DataShed Limited 2015

Page 12: Making big data work

Continuous Integration Demo

© the DataShed Limited 2015

Page 13: Making big data work

How to make Big Data work?

1. Understand your problem

2. Apply appropriate tools

3. Automate everything.

© the DataShed Limited 2015

Page 14: Making big data work

Little Big Data

© the DataShed Limited 2015

Page 15: Making big data work

A problem closer to home…• Every business needs to understand:• Their potential customers and market• Current customers• Their products and sales• How and when they engage prospects and customers

• Analytics and data are expensive• Many of the mandatory elements are very similar for everyone• The DataShed is Analytics as a Service and Single Customer View as a

Service.

© the DataShed Limited 2015

Page 16: Making big data work

The deduplication problem…

• SME has 250,000 customers (two systems of record)• To identify duplicates brute force approach: 31,249,875,000 comparisons• Building a system to process a minimum of 100 clients a day…

• 3.1 trillion records to compare using > 10 different algorithms

• Traditional scale up approach would be expensive, and makes large assumptions around blocking and partitioning rules• A small data problem but a big data solution?

Title First Name Surname Address 1 Address 2 Address 3

Dr R J Smith Two Oaks 112 Old St. County Durham

Mrs Robyn Smith 112 Old Street Durham DH1 5YJ

© the DataShed Limited 2015

Page 17: Making big data work

© the DataShed Limited 2015

Page 18: Making big data work

The Shed demo

© the DataShed Limited 2015

Page 19: Making big data work

How to make Big Data work?

1. Understand your problem

2. Apply appropriate tools

3. Automate everything.

© the DataShed Limited 2015

Page 20: Making big data work

How to make Big Data work?1. Understand your problem

• ’Big Data’ challenges aren’t necessarily new, however much of the technology is• Articulate and communicate – focus on distilling your problem down• Incremental improvement not wholesale replacement

2. Apply appropriate tools• Understand the economics as well as the technology• New technologies need to be evaluated within the context of your problem scope• New technologies are enablers not deliverables (#datalake)• ’Big Data’ technology should be seen as complementary to existing technology

3. Automate everything• Continuous integration to include all testing• Containerise where possible• Measure everything

© the DataShed Limited 2015

Page 21: Making big data work

If you really want to get involved…

© the DataShed Limited 2015

Page 22: Making big data work

Get your hands dirtyIf you’re interested in learning more, we’ll be hosting a hands-on labs event in the near future. Send your details to:

Email: [email protected]: @thedatashed

© the DataShed Limited 2015

Page 23: Making big data work

Any questions?

© the DataShed Limited 2015

Lewis CrawfordPrincipal Architect @ the [email protected]