Analytical Systems Evolutionfrom Excel to Big Data platformsand Data Lakes
Kiev, 16 Nov 2017
Big Data Meetup #1
Maxim TereschenkoBI / Big Data Lead
About Me
2005
2010
20152009
2008
Product Owner
From BI Developer To Delivery Manager
BI Developer
BI Business Analyst
BI Consultant
Consulting ProductOutsourcingEnterprise Consulting
2017
Practice Lead
Business Development
About Provectus
Over 400 talented techies
Established as Provectus Inc.
in 2010
R&D offices in the US, Ukraine, Russia
Work with enterprise-level Silicon Valley companies & fast-growing startups
Organization Structure
Provectus
Reinvently(mobile, UI/UX development)
OrbitLift(eCommerce,
blockchain, IoT)
Squadex(DevOps, Big Data &
Analytics)
Hydrosphere
● Data Analysis Stages● Relational Datawarehouse● Extended Relational Datawarehouse● Big Data Challenges● Modern Analytic Landscape● Big Data Platform● Data Lakes● Future Trends Predictions
Agenda
A central repository of integrated data from one to more dispate sources
Reportings & Analysis
Data Governance
Relational Datawarehouse
DWH Use Cases
Corporate ReportingPixel Perfect ReportingAd-hoc analysisReal-Time AnalyticsAdvanced AnalyticsAll Data AnalysisSelf-service BI
Agility
Scalability
Cost
Performance
Consistency
Velocity
Security
Use Cases
Data Types(s)
Ext DWH Use Cases
Agility
Scalability
Cost
Performance
Consistency
Velocity
Security
Corporate ReportingPixel Perfect ReportingAd-hoc analysisReal-Time AnalyticsAdvanced AnalyticsAll Data AnalysisSelf-service BI
Use Cases
Data Types(s)
Big Data Challenges
> 1 billions of users > 3 billions of photos daily (12 000 per sec) > 5 billions of comments daily (58 000 per sec)
Typical Big Data Challenges
UNSTRUCTURED
STRUCTURED
HIGH
MEDIUM
LOW
Archives Docs Business Apps
Media SocialNetworks
PublicWeb
DataStorages
MachineLog Data
SensorData
Velocity Variety VolumeComplexity
Architecture Concerns:
• Scalability
• Performance
• Extensibility
• Data Quality
Data Sources:
• Fault-Tolerance and Availability
• Security
• Cost
• Skills Availability
Big Data Questions
DataDiscovery
Dashboards and Business
Reporting
Real TimeIntelligence
Business Users
Intelligent AgentsConsumers
How to implement Recommendations or Anomaly
Detection achieving Low latency?
Data Scientists/Analysts
How to enable Data Science/
Advanced Analytics team for predictive
and advanced analytics?
How to provide Real-time Dashboards or Self-Service BI with high Data quality and
good Performance over terabytes and
petabytes?
Operations
Modern Analytic Landsape
A modern integrated approach for solving Big Data/Business Analytics needs across multiple verticals and domains
All Data
Real-time Data Processing
Data Acquisition and Storing
Dat
a In
tegr
atio
nEnterprise
Data Warehousing
Data Management (Governance, Security, Quality, MDM)
Analytics
Reporting and Analysis
Predictive Modeling
Data Mining
Data Lake (Landing, Exploration
and Archiving)
UX and Visualization
Applications
Application data
Media data: images,
video, etc
Social data
Enterprise content data
Machine, sensor, log
data
Docs and archives
data
Customer Analytics
MarketingAnalytics
Web/Mobile/Social Analytics
IT Operational Analytics
Fraud and Risk Analytics
Complex Event Processing
Real-time Query and Search
Big Data Platform
Real-Time AnalyticsSelf-Service BIStreamingPixel Perfect ReportingAdvanced AnalyticsAll Data AnalysisCorporate Reporting
Use CasesAgility
Scalability
Cost
Performance
Consistency
Velocity
Security
Data Types(s)
Data Lakes Technology
Based on TDWI (https://tdwi.org/) research:
AWS Data Lake Azure Data Lake
Data Lakes Architecture (Example)
https://www.searchtechnologies.com/blog/search-data-lake-with-big-data
Data Lakes
Self-Service BIAdvanced AnalyticsPredictive AnalyticsAll Data AnalysisText MiningPixel Perfect ReportingCorporate Reporting
Use CasesAgility
Scalability
Cost
Performance
Consistency
Velocity
Security
Data Types(s)
Future Predictions by Gartner
● Next-Generation Data Discovery ● Smart Data Discovery Capabilities● Natural-Language Generation and Artificial Intelligence ● 50% of analytic queries will be generated using search,
natural-language processing or voice, or will be autogenerated
● Organizations that offer users access to a curated catalog of internal and external data will realize twice the business value from analytics investments than those that do not
https://www.gartner.com/doc/reprints?id=1-3TYE0CD&ct=170221&st=sb≈
Voice Analysis
https://www.gartner.com/doc/reprints?id=1-3TYE0CD&ct=170221&st=sb≈
Any Questions?
https://www.linkedin.com/in/maxter
maxterkiev
maxim.tereschenko
Thank you!
https://www.facebook.com/provectuslife/