Upload
others
View
19
Download
0
Embed Size (px)
Citation preview
Making Analytics Viable in Enterprises:
Potential routes for Industry 4.0
Jorge Sanz
Anusha Choori
Business Analytics Center
National University of Singapore
• Business Analytics as an enabler for Industry 4.0
• Cases from the field, typical challenges and lessons-learnt
• Viable roadmaps for Industrial Sector companies
• Potential opportunities for Luxembourg
• Conclusions
Agenda and Goals
Industry 4.0
Source: 2016 Global Industry Survey – Industry 4.0: Building the digital enterprise - PWC
Industry 4.0 – The Multi-Faceted Goal Framework
Core Capability
Key Dimensions in the Framework
Some Key Enablers of Industry 4.0
Capture and process large data sources1.
Collect, retrieve and query data
Analyze data (from reporting to prediction)
Integrate conventional IT silos more deeply 2.
Integrate Information-based Insights into Process Lifecycle
Cloud and Everything-as-a-Service
Reduce cost-of-ownership for viability 3.
• Not only for Industrial Segments but also for most other industries …
• … realizing some critical capabilities for Industry 4.0:
Large-data analytics and enterprise architecture enable a new thinking of production
management and factory management
Analytical algorithms (some capable of learning from data) will be able to achieve
more flexibility and robustness in manufacturing, supply chain and distribution
Different forms of “cognitive systems” to support decision-making are part of the
emerging jargon (back to AI, NLP, etc. powered up by big data)
The Role of Business Analytics
A relatively new discipline that addresses the key enablers
Key Imperative: Shorten the solution cycle and reduce costs
Exploit new opportunities based on business analytics (from production improvements to new business models)
Collect, transmit, analyze large data from devices to monitor and predict / anticipate service needs
Shorten the Ideate-to-Monetize value-generation path and reduce cost-of-ownership
1
2
3
Business Competence
Organization Processes
Modeling and
Technology
Knowledge Areas building the BA Domain
Business Competence• Finance, accounting, marketing, supply chain, HR,
channels, IT, customer relationship …
• Industry-specific competences: underwriting, fraud,
claim life-cycle, product design, wealth
management, traffic …
Organization Processes• The design and transformation of work processes
Decision-making processes
Strategy processes
Operational processes
• How information improves and innovate processes
Modeling and Technology • Stochastic Models, Operations Research (and
tools: R, SAS, SPSS, …)
• Data generation sources (eg: Mobile messaging,
GPS locator, Surveillance cameras, ATMs, etc)
• Systems in support of Cognitive and Information
processes (Watson, HANA, etc)
Extending & Emerging Professions
Business Analytics Applications (Most Active Markets)
Source: IDC
9
Projects from NUS Business Analytics Center, 2016 (I)
- PnL Analytics
- Cyber Security
- Anti Money Laundering
- Fraud analytics
- Risk assessment model forinvestigation program
- Optimising maintenance schedule for fleet management
- Text Mining - Social Network and Geospatial Analytics in context of Insurance
- Motor Pricing KPIs
- Travel Pricing KPIs and exploratory analysis
- Cross-sell and up-sell in insurance - POS transactional data - News recommendation engine for high net worth customers
- Economic Scenario Stress Testing
- The Future of Audit (1)
- The Future of Audit (2)
- Analysis of Customer Queuing Time & Headcount planning
- Predicting High Risk Churn Segments Via Product Usage Data
- BlueMix & Watson Analytics - Breakout detection for Hep C patients. - An Analytics Approach to Improve Subscription Rate for Nursing Course (prelim title)
- Case Study on Global FP&A Transformation
- Balance Sheet Forecasting
- IT Tools Comparison
- Case Study on Global FP&A Transformation
- Sales Forecasting and Tools for Predictive Analytics
10
Projects from NUS Business Analytics Center, 2016 (II)
- Supply Chain Optimization 2.0
- Global logistics cost optimisation
- Social Media
- Automatic Rostering System
- Understanding Family Attitudes and Social Support Networks through Analytics
- Deriving Insights from NEHR (National Electronic Health Record)
- Developing an accurate model to provide estimates on how long a job should take given the characteristics of the job
- ALM Roll-Tagging Prediction
- Applications of Analytics to AML –Proposing a Risk Classification Model
- Analysing High Risk Segments in Auto Loan Portfolio
- Customer-Money Life Cycle
- Marketplace analysis - Customer credit risk analysis - Emergency Medical Service (EMS) Ambulance Demand Analytics & Prediction
- Sales Management Analytics
- HR Analytics
- Pricing Assessment Tool based on Analytics
- Healthcare analytics
- Frequent Attenders to the Emergency Department
- Online Analytics - Analysis on overtime cost- Analyzing cancer claims for
policy holders
11
Projects from NUS Business Analytics Center, 2016 (III)
- IoT / Event Analytics in Manufacturing
- Data Lake architecture to deliver a virtualization layer for disparate data sources
- Market Research
- Social Media /Digital Marketing/PR- CRM / Markets
- Optimising Endowment Portfolio Performance
- Determining optimal level of markdowns through customer segmentation for revenue maximization
- Predicting eBay Auction Sales for Laptops
- Predicting Airbnb New User Bookings
- Predict which hotel type will an Expedia customer book
- Analysing Residential Property Prices
Industry 4.0
• Manufacturers could improve preventive and predictive maintenance of different production assets
• Many manufacturing systems are finding it difficult to collect, aggregate and benefit from data originating from large data sources because of the lack of novel analytical tools and appropriate infrastructure– For example, unplanned and excessive downtime of equipment increases – this directly affects
the operational cost and throughput
– This requires the utilization of advance prediction tools and algorithms so that data can be systematically processed into information that can explain the uncertainties, breakdowns, failures, short-stops and can thereby make more “informed” decisions
Manufacturing – Prevailing Challenges
By introducing analytics and more flexible production techniques,
manufacturers could boost their productivity by as much as 30 percent
promises … promises
“Source: Industrial Insights Report, Accenture – Industrial-Internet-Changing-Competitive-Landscape-Industries-2015
From Physical to Digital to Analytics …
Physical World (Entities)
Computational Space
1. Cyber – Physical Interaction
Learn & synchronize
from physical world:
Knowledge extraction &
accumulation
Feedback to the physical
world:
Production scheduling
Maintenance & Adaptation
2. Machine
Health Awareness
Analytics
3. Optimal
Decision Support
Analytics
Computational Space
2. Machine
Health Awareness
Analytics
3. Optimal
Decision Support
Analytics
Computational Space
2. Machine
Health Awareness
Analytics
3. Optimal
Decision Support
Analytics
An e
nsem
ble
of d
igita
l life-c
ycle
s o
f
entitie
s d
eplo
ye
d in
diffe
rent s
ettin
gs
…
…
… yielding opportunities for new business models
Augmented Digital
Product Player
Focus on products
digitally-endowed
(like sensors or
communication
devices)
Focus on data
analytics services;
Give access to
customers via a
dedicated (online)
platform (APIs)
Integration of third-
party partner or
competitor products
and control systems
in a complete
customer ecosystem
Asset Intensity
Data Intensity
Industrial companies are moving towards greater digital value-creation, from
augmented products to serving digital ecosystems
Source: 2016 Global Industry Survey – Industry 4.0: Building the digital enterprise - PwC
Complete Solution/
Service Provider
Data Analytics,
Content & Platform
Integrator
Integrated Digital
Ecosystem
Provider
Focus on digital
products and data-
services; which
provide a complete
solution for the
customer
Predictive Maintenance Analytics
Other Internal
motors
Receipt Printer
Card ReaderKeypad
Cash Dispenser
Cash
Deposit Unit
Data Transfer
Asset data aggregation
Bank and / or owner of ATMs
Monitoring/ Maintenance
Embedded Sensors
-as-a-Service
Other forms of Analytics-as-a-Service
Engine related sensors
Combustion sensor
Front light sensor
Internal light sensor
ExhaustSensor
Mobile Application
Manufacturing / Assembly
Monitoring
Gas station
w/ intelligent
appliance
Dashboard
Cloud Repository
Analytics
Sensor data from the equipment
Factory equipment
Gas GeneratorTemperature
Water Pressure
Humidity
Speed
Power
Influencing
variables
Gas concentration Response
variable
Predictive Analytics
Temperature
Water Pressure
Power
Gas
Concentration
• 200 sensors on every
equipment on the shop
floor
• Sensors emit data at every
500 millisecond interval
Predictive Analytics being
performed to understand the
underlying the data patterns
and to predict the
abnormality of gas
concentration in equipment.
But …
Current capabilities of Industry 4.0 segments in Analytics
Q.: Are companies ready for more predictive and innovative kinds of solutions?
A.: “Not yet”
58% of the companies
have capabilities to
collect data and analyze
it
Only 40% of the
companies can predict
based on existing data
Fewer still, 36% only,
can optimize operations
from data insights
Source: Based on a survey conducted by GE and Accenture, 2015
Analytics and Reporting
Integration
Sensor and Other Data Sources
Analytics &Reporting
Higher-LatencyIntegration
Low-Latency
Three-Tier Scenarios in Business Analytics
ERP Data
EDWIn-Memory
Real-Time, Near Real-time, Batch / Off-line
Depending on acceptable latency of decision-making and cost-of-ownership
Different analyticalsituations in trainingmode from production
Analytics &Reporting
Analytics
Data Lake Analytics
Hadoop for distributed storage
Apache Spark for distributed computing
Edge Analytics
Apache Spark Streaming for low latency analytics
Managing Industrial data sources and analytics infrastructure- Open Source Infrastructure -
Business Analytics – Types of analysis
Depending on the use-case, the type of analytical approaches differ:
• Offline Analysis is performed on static data Data Lake Analytics (or Data Store Analytics)
• Online Analysis is performed on data that is streaming in Edge Analytics
Types of analytics tools (Open Source)
Managing Industrial Big Data and Analytics Infrastructure
- Open Source Infrastructure -
Industry 4.0 opportunities in the manufacturing unit of a leading packaging and
processing company
Case-Study
The infrastructure in the organization comprises of ERP (Enterprise Resource Planning)
systems, Business Warehouse (BW) units, and traditional transactional databases (MES
– Manufacturing Execution Systems) for capturing and analyzing sensor data and
operations data
Data ingestion, storage and processing are all performed in their current environment
which consists of traditional data stores
Architecture gives very little scope to perform analysis on massive data and near-real
time analytics. By adding more BW support, databases, compute power, they run into
the risk of paying a HUGE cost
Current setup lacks:
Infrastructure to ingest/ store massive data
Framework to perform large-scale data analytics
Problem Statement
Industry 4.0 Business Analytics – A case-study in the manufacturing
unit of a leading packaging and processing company
Proposed Infrastructure for Business AnalyticsExisting infrastructure
On-Line Basic Processing
Data Input (Batch into SQL Server)
Storage and EDW
Processing and Analysis (Business Warehouse)
Descriptive Diagnostic Predictive Prescriptive
Machine Data and Sensor data
Data Ingestion (Kafka)
Storage(Central + Distributed) HDFS
Processing (Central + Distributed) - Spark
Status Quo
Proposed architecture
Solution Overview
The right-hand side depicts the overall solution overview to analyze
both offline (historical data) and near-real time data
Machine and Sensor data
Proposed End-to-End Infrastructure
Industry 4.0 Business Analytics – A case-study in the manufacturing
unit of a leading packaging and processing company
Solution Overview
Sensor
logs
Alert toOperator
Why Apache Hadoop?
Temperature
Pressure
Water temp
Speed
Humidity
Ambient concentration
MES
(Manufacturing
Execution
System)
Shop floor system
Data Acquisition
System
SQL Server
Data reflected
after 24 hours
- Bounded by the actual size of the database
- May need to perform truncation
- Cannot support unstructured data
- Scope for analytics is reduced
Appropriately routed to reach HDFS cluster in Hadoop• Scalable
• No license fees
• Distributed storage
• Supports structured and
unstructured data
Current Infrastructure
Proposed Infrastructure
What is Apache Hadoop?
What is - Apache Hadoop?
• Apache Hadoop is an open source framework which was built for:
– Distributed Storage – HDFS (Hadoop Distributed File System)
– Distributed Processing – Map / Reduce
• HDFS stores large files (structured and unstructured data) across several machines (laptops), PC’s and
commodity servers
• Even though the data is spread across several machines in the cluster, the user is still guaranteed a
“universal” view of the data – this is possible via a single management interface
Inexpensive Storage
• Without the hassle of purchasing or licensing specialized hardware
• Having the capacity to store structured and unstructured data originating from sensors
Ability to scale easily
• No compromise on data storage
• No truncation of data points originally captured
Preliminary Analysis
• Seamless integration with developer systems
• Universal access of stored data within the cluster
Analytics Infrastructure – Why Hadoop?
• The existing infrastructure cannot store petabytes of sensor data (there are about 15-20 sensor
tags in each part of the equipment on the shop floor with the sensors emitting data signals for
every 500 milliseconds)
• As a result, it becomes challenging to perform even simple off-line analysis on ALL the sensor data
• In addition, the organization does not want to incur additional licensing fees and while cloud
subscription fees are more affordable, data security concerns and latency of the needed data
upstreaming to perform analytics are caveats
Proposed Solution
Problem Statement
• The COST of storing all the individual sensor values from all the factory equipment on the
shop floor needs an inexpensive storage which also has the ability to scale as and when
needed
• Moreover, since the organization does not want to incur additional costs of purchasing
special hardware to host petabytes of sensor data, the best solution would be to choose an
open source distributed data sink which can be easily installed on commodity, inexpensive
hardware and which can inherently scale up as more machines are added to the cluster
• Hence, Apache Hadoop was chosen as the data store to host sensor data and to perform
OFFLINE analytics
Analytics Infrastructure – Why Apache Spark?
• Hadoop can store petabytes of machine sensor data (both structured and unstructured) but
when it comes to complex data analytics over that massive VOLUME of distributed data, Hadoop
falls short in performing an efficient and quick computation
• The existing MapReduce Operations do not fare well when the user is joining two or more large
datasets with several complex join conditions. Overall, MapReduce tasks generate a lot of
overhead by re-reading and parsing data which reduces its overall computational
efficiency to a LARGE extent even for off-line analysis
• In addition, building complex models or applications in Hadoop requires deep Java programming
skillsets
Proposed Solution
Problem Statement
• Keeping the volume of distributed data in mind, the natural choice to pick an open source
framework to perform efficient, parallelized computing is APACHE SPARK
• The main advantage of adopting Spark is that it can also run on commodity hardware by
pooling ALL the memory of the available machines in the cluster and assigning jobs to them
and orchestrates the execution in parallel – thereby saving time, cost and improving its
efficiency and lowering its execution time.
Apache Spark for Business Analytics in Industry 4.0
What is Apache Spark?
• Apache Spark is an open source cluster-computing framework which was built to overcome the limitations
of Hadoop’s MapReduce computing framework
• Spark was mainly built to achieve:
– Parallelism in data operations
– Distributed computing across a cluster of RAM’s which are available in the cluster
– Fault tolerance
– Scalability
• “Blends” in with Hadoop
Apache Spark – Indispensable Components
Spark Core
Spark
StreamingSpark SQL Spark MLlib GraphX
Apache Spark Streaming – BA Infrastructure for Industry 4.0
• Data in a Streaming Analytics environment is processed (on-line) before it lands in a database
• Currently, in the manufacturing unit of a leading packaging and processing industry, machine
sensor data is being stored after a “significant change” is detected in their data acquisition system
• In addition, this data is truncated – sensor readings which may suggest the working status of the
equipment in the future are lost as a process when the data hits the database
• Lack of real-time streaming analytics to predict alerts with more anticipation
Proposed Solution
Problem Statement
• What if this sensor data is analyzed as it is streaming in? Spark Streaming
• And, what if decisions were made before it hits the database? Spark Streaming
• Analyzed “industrial big-data” can then be made to flow into a database of their choice (SQL
Server), a distributed database (Cassandra, HBase) or into a distributed file system (HDFS).
Apache Spark Streaming
Apache Spark Streaming
Analyzing data streams in real time, streams of real-time sensor data instead of large, data-
intensive batch jobs on a daily basis
Apache Kafka – BA Infrastructure for Industry 4.0
Problem Statement
Use-case in a real life scenario:
• The leading packaging and processing firm has about 200 sensors in every equipment and machinery
they own. These innumerable sensors emit data at every 500 millisecond interval – this data has to be
correctly captured, queued, analyzed and stored for further analysis to happen
• In cases there are many real-time applications that “consume” data from 1000s of these sensors for
reporting and analytics, it becomes a criss-crossed and random way of “requesting data” from sensors
• Add to that, there is a risk of losing data mid-way and listening to the “wrong” sensor reading or listening to
the messages which are coming out-of-order
Apache Kafka can simplify the current messaging architecture by using a Producer-Consumer
approach and orchestrates messaging services by acting as a broker.
Proposed Solution
The coordination, replication, fault-tolerance, partitioning and parallelism of this architecture
are taken care of by the Kafka server entirely.
Producers publish to TOPICS Kafka orchestrates messaging Subscribers listen to these TOPICS
Apache Kafka
Topic 1 Topic 2Producers Consumers
Analytics Modelling
Database
Database
Streaming
applications
Kafka Broker
Zookeeper
0 1 2
0 1
0 1 2 3
0 1 2
Partition 1
Partition 2
Partition 3
Partition 4
Apache Kafka – for Industry 4.0/ IoT
Apache Kafka’s role in Manufacturing
Use cases in IIoT (Industrial Internet of Things)
- Real time stream processing (coupled with Spark Streaming)
- General purpose message bus
- Collecting user activity data
- Collecting operational metrics from sensors, applications, servers or devices
- Log aggregation
- Change data capture
- Maintaining a commit log for distributed systems
Source: Confluent
Cloud Strategy
• Reduce cost-of-ownership, simplify management of IT operations, and shorten
the path from invention to delivery of new applications
• Develop a new business model opportunity by creating a domain-specific
service-suite accessible to subscribers or pay-per-use through APIs
Cloud for Analytics Capabilities is a very important option to manage the complexity of
Business Analytics infrastructure
Scalability, High Availability
Data Model Flexibility, Data Mobility
Seamless work with an ecosystem of apps and tools
Built-in analytical tools support for faster and efficient data analysis on-line and off-line
CRITICAL: smooth integration with ERP capabilities, thus facilitating better bridges
between process management and information life-cycle in the industry 4.0 enterprise
Offerings for key open source BA infrastructure
IBM BigInsightsAvailable on-premises, on-cloud, and integrated with other systems
in use today
Text Analytics
POSIX Distributed File
system
Multi-workload, multi-tenant
scheduling
IBM BigInsights
Enterprise Management
Machine Learning on
Big R
Big R (R support)
IBM Open Platform with Apache Hadoop
(HDFS, YARN, MapReduce, Ambari, Flume, HBase, Hive, Kafka, Knox, Oozie, Pig, Slider, Solr, Spark, Sqoop, Zookeeper)
IBM BigInsights
Data Scientist
IBM BigInsights
Analyst
Big SQL
BigSheets
Big SQL
BigSheets
Free Quick Start (non production):
• IBM Open Platform
• BigInsights Analyst, Data Scientist
features
• Community support
. . .
IBM BigInsights – On-premises version
The Open Source on Cloud by SAP
SAP HANA Vora
SAP HANA + Vora + Hadoop
• SAP HANA Vora integrates SAP HANA data with data lakes(Hadoop)
• Seamless integration with HANA + Spark + Hadoop
• One can archive ERP data from HANA to Hadoop
• Integrated BI
(also on premises)
MapR, Hortonworks, Cloudera
More ICT offerings for key BA infrastructure on Cloud
Microsoft
Azure HDInsight
On premises and
Cloud
On premises and
CloudOn premises and
Cloud
Cloud Service
Offerings for key Open Source BA infrastructure
Microsoft
Azure HDInsight:
Components offered:
Apache Hadoop/ YARN
Apache Tez
Apache Pig
Apache Hive
Apache Hbase
Apache Sqoop
Apache Oozie
Apache Zookeeper
Apache Storm
Apache Mahout
Apache Spark
Cloud Service
Amazon Web Services – Support for Open
Source Capabilities on the Cloud
Hadoop – Elastic Map Reduce Apache Spark – Elastic Map Reduce
HDFS is automatically installed with Hadoop
on Amazon’s EMR(Elastic Map Reduce) cluster
EMR = Managed service Hadoop Framework
by Amazon
• Amazon EMR is easy to tune in for clusters and
helps reduce infrastructure maintenance and
operational costs
• Supports multiple data stores
• Since it is elastic, one can provision 100s and
1000s of compute instances to process large
datasets
(increase and decrease the number of instances)
Source: https://aws.amazon.com/emr/
Spark is also supported by Amazon EMR
cluster
The in-memory caching, optimized execution,
general batch processing, streaming analytics,
machine learning, graph databases and ad-hoc
queries are all supported on cloud by Amazon
EMR and EC2(Elastic Cloud Compute)
• Amazon Elastic Cloud Compute(EC2) is a web
service that provides resizable compute
capacity in the cloud
Source: https://aws.amazon.com/emr/details/spark//
Commercial vehicles for delivering viable BA solutions
IBM Predictive Maintenance
Predict Asset Failure/
Extend Life:
• Determine failure based
on usage characteristics
• Identify conditions that
lead to high failure
Predict Part Quality
• Detect anomalies within
the process
• Conduct in-depth root
cause analysis
IBM Watson Explorer On premises and in cloud
Analytics on dispersed sources of structured and unstructured data
Commercial vehicles for delivering viable BA solutions
IBM Streams
On premises and in cloud
Commercial vehicles for delivering viable BA solutions
SAP HANA and Analytics
SAP HANA Platform
JavaScript, SQLScript, SQL
Web Server
Spatial Search Text MiningStored Procedure &
Data Models
Application &
UI services
Business Function
LibraryPredictive Analytics
LibraryDatabase Services Planning Engine Rules Engine
Planning Engine
Transaction Unstructured Machine Hadoop Real-time Location Other Apps
Applications Cloud Applications Analytics Excel IoT Mobile/ Web API
SAP offers near-real time in-memory computing capabilities and efficient reporting through HANA
Commercial vehicles for delivering viable BA solutions
SAP Smart Data Streaming On premises
Opportunity for Luxembourg: High specialization in selected
capabilities leading to new ICT Solutions with impact to Industry 4.0
Architectures integrating process and big data. Achitectures for Cloud-based Applications and Services
Affordable options for big data and analytics needs
Create a sandbox of new and custom algorithms for quick PoCs. Define an API App Cloud for Industry 4.0
How to use data analytics for better security
GDPR and data confidentiality assurance in Industry 4.0
Programs in BA for executives, managers and engineers
Proper funding for collaboration between start-ups / R&D / industry
Business Analytics
Reference Architectures
Infrastructure
Research and Innovation
Security of Network Systems
Legal Framework
Training and Education
Others
Key Topics
Paths to make BA viable for Industry 4.0
1. Focus on specific areas where large data sources and analytics may lead to operational savings and new business
– Do not boil the ocean by making exotic mega-plans or tough ROI cases
– Start simple with quick wins for exec management buy-in
2. Get help to assess the viability of the initiative very quickly (technically and financially)
– If your organization does not have the specific skills, do not rely on internal-only assessments (good IT or Engineering does not mean that they will know BA)
– Partnership with an R&D organization that can help you assess (for example, the FEDER project in LIST)
3. Use ICT third-party rented infrastructure to discover and validate solutions, architecture, options in depth
4. If you can afford to do some BA work based on internal infrastructure, test ideas by using simple tools
– Open tools are appealing for zero-cost license but the skills needed to use them properly and maintain an informal development environment are very specialized
– Partner with an organization that can help you define a good architecture for the solution toward a fast No / No-go PoC (i.e., fail fast and cheaply)
Open source software - Considerations
Programming
Language Support
Runtime Considerations Platform
Support
OS
Hadoop Java • Distributed disk access
• No firewall between intended
systems
JDK
Java
Linux
Windows
Mac OS
Spark Scala, Java, Python, R
via SparkR
• Distributed memory access
• No firewall between intended
systems
• ICMP protocol should not be
blocked
JDK
Java
Linux
Windows
Mac OS
Spark
Streaming
Scala, Java, Python • Distributed memory access
• System ports should not be
blocked by firewall
JDK
Java
Linux
Windows
Mac OS
Kafka Several including Scala,
Java, Python,
stdin/stdout
System ports should not be
blocked by firewall
JDK
Java
Zookeeper
Linux
Windows
Mac OS
SAP
Smart
Data
Streaming
CCL Smart Data Streaming should be
hosted on a separate server
[TBD] Linux