Upload
kaida
View
38
Download
0
Embed Size (px)
DESCRIPTION
“Big Data” The wrong name for a major issue?. Clive Longbottom, Service Director, Quocirca Ltd. “Big Data”. It’s not about databases per se It is about: Volume – but not just databases Velocity – results need to be produced in near real-time Variety – the aspect that is missed by many - PowerPoint PPT Presentation
Citation preview
Clive Longbottom,
Service Director, Quocirca Ltd
“Big Data”The wrong name for a major issue?
Clive Longbottom,
Service Director, Quocirca Ltd
© Quocirca 2013
“Big Data”
• It’s not about databases per se
• It is about:– Volume – but not just databases– Velocity – results need to be
produced in near real-time– Variety – the aspect that is missed
by many– Veracity – how good are the inputs– Value – is the data worth it?
© Quocirca 2013
Which of the following statements most closely matches your understanding of the term “big data”?
© Quocirca 2013
How important do you believe big data will be to your organisation over the next 2 years?
© Quocirca 2013
A basic “rule of thumb”
• 20 years ago:– Only 20% of an organisation’s
information was in electronic form– 80% of this was in a formal database
• Today:– Well over 80% of an organisation’s
information is in electronic form– Less than 20% is in a formal database
© Quocirca 2013
The growth of unstructured
• Not just text – but images, video media assets, VoIP, Videoconferencing
• Replicated/archived data a large part of growth
• But – is it completely unstructured?
Source: Ram Subramanyam Gopalan
© Quocirca 2013
File formatting
• XML (or quasi-XML)• CSV/tab delimited• Text blocks• Meta data• TCP/IP packet header information• Pattern recognition• Colour, shape, texture (CST)• Inferred data
© Quocirca 2013
The open “value chain”
Your Organisation
SupplierSupplier’s
supplierCustomer
Customer’scustomer
Information flows
“Open” information from e.g. search engines, social networks
© Quocirca 2013
Organisation information sources
• Organisation data:– Enterprise application data– Office documents– Reports, analytics– GRC information– Information on competitors– Financial performance data– Images, voice, video…– …
© Quocirca 2013
Supplier information sources
• Supplier data– Logistics data– Inventory data– Transactional data– Competitive information– Credit and background checks– Invoices, catalogues, contracts, images…– Voice, video…– …
© Quocirca 2013
Customer information sources
• Customer data:– Orders, payment details, returns information– Past purchases– Credit and background checks– Searches, web analytics– Social media comments– …
© Quocirca 2013
Information issues
• You no longer have control– The open value chain removes
direct control– Security of information assets
is critical• Identifying and aggregating
information assets– Capturing information when
and where possible – and legal– Bringing structured and
unstructured together• Sifting through the dross to get to
the “golden nuggets”
© Quocirca 2013
Shrink and filter…
• Information under your control:– Deduplicate– Taxonomise– Index– Tag
• Information not under your control:– Filter (intelligently)– Tag and index when it crosses your
boundaries
© Quocirca 2013
Federate and aggregate
• Link databases– Use master data management
• Bring in unstructured data– Use Hadoop along with NoSQL datastores (e.g.
Cassandra, MongoDB)• Use cross-function search and reporting tools
– E.g. HP Autonomy, CommVault Simpana• Use analytics to present results in meaningful ways
© Quocirca 2013
Basic schematic approach
SQL NoSQL
MapReduce
Filter
Apply metadata
App
Search, analyse and report
© Quocirca 2013
A future glimpse?
• It’s déjà vu all over again– Remember in-memory databases?
• Big data cannot remain as a jigsaw solution– Full-service solutions will come forward
• Who will be the winners?– Oracle, IBM, Microsoft?– SAP?– EMC, Symantec?– The Open Source environment (e.g. 10Gen,
Apache/Cassandra, CouchDB)?
© Quocirca 2013
Conclusions
• Big Data has many vectors– Volume, velocity, variety and veracity: each is as
important as the others - value will accrue through getting them right
• More information is outside the realm of your direct control– Capturing what can be captured in a useful manner is
key• The evolution of the market is rapid
– NoSQL and Hadoop provide the underpinnings for a new, information centric approach
• The formal database is not dead– But it is only on aspect of the problem – and the
solution
© Quocirca 2013
Thank you
Contact details:[email protected]
Further reading:http://quocirca.com/reports/150http://quocirca.com/articles/617http://quocirca.com/articles/637