Upload
cambridge-semantics
View
582
Download
1
Embed Size (px)
Citation preview
www.ovum.com
© Copyright Ovum 2015. All rights reserved.
Finding order in chaos: Building smarter data lakes with semanticsSurya Mukherjee, Senior Analyst, Information Management
@SuryaatOvum
2© Copyright Ovum 2015. All rights reserved.
Surya Mukherjee
• Leads Ovum’s analytics practice
• Keynote speaker at several global analytics
events and independent analytics thought-leader
• Advisor to numerous small to large enterprises
on analytics and data
• Independent product, vendor, and market
evaluator
• Experience in both working for and advising the
lines of business
3© Copyright Ovum 2015. All rights reserved.
‘Smart’ data lake or data landfill? The difference may be ‘semantic’
Data lakes are fast becoming a front-burner issue as early Hadoop adopters plan or consider implementation
They are attractive for several reasons –fixed schema independence, commodity hardware, economical alternative for archiving, cross platform insights
In principle, data lakes should be transparent, manageable, and governable, even if incrementally, without which organizations may be exposed to risks and lower ROI
There are many approaches from data platform and integration providers to making data lakes governable, each with positives and tradeoffs
The semantic approach, which is driven by taxonomies and ontologies, can be extremely helpful for industries such as financial services and healthcare
In today’s webinar, we explore the world of semantics and how it can be used to make your data lake ‘smart’
4© Copyright Ovum 2015. All rights reserved.
Agenda
Data Lake enters the enterprise agenda
Data landfill versus a ‘smart’ data lake
Key components to a smart data lake
The semantic approach to data lakes
Recommendations for enterprises
5© Copyright Ovum 2015. All rights reserved.
Data Lake enters the enterprise agenda
Everyone’s taking about data lakes, because:
Hypermarket for all data types, speeds, and sizes
No need for joining everything now
Batch, real-time, or in-betweens
Expert/scientists, data analysts, business users
Cost
Re-use of skills and software
Not only for web-scale companies!
6© Copyright Ovum 2015. All rights reserved.
Many audiences, one lake
7© Copyright Ovum 2015. All rights reserved.
But what makes a lake, a lake?
Ovum's definition of a data lake is a governed, tagged, workable repository that becomes the default ingest point for raw data.
We strongly believe that without governance, a data store – structured, unstructured, or both, cannot be called a data lake and is better labelled a data swamp or landfill.
Our requirements for data lakes are therefore more stringent than many others who classify any solution that can store multi-structured data as a data lake.
8© Copyright Ovum 2015. All rights reserved.
Data landfill versus smart data lakeStages of Hadoop adoption
9© Copyright Ovum 2015. All rights reserved.
Key components to a “smart” data lake
Analysis-ready
10© Copyright Ovum 2015. All rights reserved.
The semantic approach to data lakes
What is it?
Founded by the W3C for the word wide web
Primarily three technical standards: RDF (Resource Description Framework). SPARQL (SPARQL Protocol and RDF Query Language) OWL (Web Ontology Language)
Subject Property Object
Darth Vader
IsAlso Anakin Skywalker
11© Copyright Ovum 2015. All rights reserved.
Relationship depiction in RDF
12© Copyright Ovum 2015. All rights reserved.
Benefits of a semantic approach
Creating linked and contextualized content that depicts inter relationships between data entities enabling deeper meaning, insight and action.
Shortening of time taken to massage data for analysis
Additional layer of metadata
Combined with technologies such as graph databases, easy to visualize relationships and explore data
Once data is meta-tagged, very easy to analyze, and extremely flexible
Mature security environment
Easier inventory management
Source/target based integration/operation
13© Copyright Ovum 2015. All rights reserved.
Recommendations for enterprises
Keep it business pain-point/ use-case focused
Start small, and grow
Get executive sponsorship early
Requires team efforts from both business and IT
Thank you!
©2015 Cambridge Semantics Inc. All rights reserved.
The Anzo Smart Data Platformfor Linking and Contextualizing
Large, Diverse Datasets
Cambridge Semantics Contact:Marty LoughlinVice President, Financial ServicesCambridge Semantics141 Tremont St., 6th Floor, Boston, [email protected](o) 617.855.9565
©2016 Cambridge Semantics Inc. All rights reserved. Company Confidential Page 15
The Anzo Smart Data Platform
• An agile, end-to-end, platform for tackling diverse information challenges
• Link and contextualize information for search, analytics, visualization and collaboration
©2016 Cambridge Semantics Inc. All rights reserved. Company Confidential Page 16
State Street Bank/D&B/EDM CouncilFIBO Solution Architecture
FrontArenaData
Dun &BradstreetData
Internal Data Sources
Map & Load (QA) Link & Query (Classification, analytics)
External Data Sources
Derivatives Data
Entity &Corp. Hierarchy
Data
Reports & Analytics
16
©2016 Cambridge Semantics Inc. All rights reserved. Company Confidential Page 17
Load & operationalize FIBO in Anzo
Map data sources onto FIBO
Load, harmonize, QA and classify data
Configure analytic dashboards
1
2
3
4
Project Deliverables
©2015 Cambridge Semantics Inc. All rights reserved.
• Business understandable models describe data and transformations
• Searchable Catalog of Data Sources, Maps & Metadata
• Query model for data lineage, impact analysis, data quality
Anzo Smart Data Lake
Anzo Smart Data Lake Server
Anzo Enterprise Server
• Standardized reports and self-service data discovery for diverse use cases
• Data curation, annotation and application workflow
Anzo Graph Query Engine
• Load, transform and harmonize diverse internal and external data sources
• Link to business meaning (e.g., FIBO)
Data Store/File System
Third party BI/Analytics
Data ProvidersStructured Sources Unstructured Sources
©2015 Cambridge Semantics Inc. All rights reserved.
Click here to view the full webinar