Big Data from the
trenchesAdvice from the FSI industry
By: Azrul MADISA
About me…
• VP – Enterprise Data Architect @ Maybank
• Take care of Maybank’s data world wide
• Nuts about data, analytics and software dev.
• Very hands on, love to read
• Teach aikido to kids
Big Data landscape today
https://www.linkedin.com/pulse/big-data-still-thing-2016-landscape-matt-turck
Too many big data tech?
Wait … what?
I have to know ALL
that?
Let’s change the game a bit…
Use c
ase
The data journey
The data journey
Acquisition Dumping
Tidy data
Real Time
Analytics
Analytical
model
Sandbox
Example: credit scoring and loan origination
Acquisition Dumping
Tidy data
Real Time
Analytics
Analytical
model
ScreensData staging
area
Data
warehouse
Score card
builder
Decisioning
Sandbox
Data
scientist
Acquisition with quality
Acquisition with quality
• Manage data quality up front
• Human-factor data quality
Data EntryData
StagingApplication
Over-night
Acquisition with quality
• Manage data quality up front
• Human-factor data quality
Data EntryData Staging
Application
Over-night
Audit trail
Weekly
Acquisition with quality
• Non-human error
• Use PEWMA algorithm
https://aws.amazon.com/blogs/iot/anomaly-detection-using-aws-iot-and-aws-lambda/
Data sandbox
Creating a sandbox on the cloud
• Why cloud:
– Scale data discovery as needed
– Merging private with public data
– Less bureaucratic
• But…
– Customer data on the cloud is a no no
Creating a sandbox on the cloud
• Masking
– Non-numerical data => No sweat!
– E.g.
• En. Abdul Jalil => 837x2unxy237e832!@
• 720324-03-8891 => 472376-84-8732
• Masking numerical data?
Creating a sandbox on the cloud
• Masking
– Non-numerical data => No sweat!
– E.g.
• En. Abdul Jalil => 837x2unxy237e832!@
• 720324-03-8891 => 472376-84-8732
• Masking numerical data?
What if there is a way to mask numerical data
while keeping the statistical properties intact
Easier for the
regulators to
digest
Creating a sandbox on the cloud
• Random projection
• Usually used for dimension reduction
Original
data
(M x N)
Random
matrix
(N x N)X =
Masked
data
(M x N)
Fast real-time vs. batch
analytics
Fast real-time analytics
• ‘Batch’ analytics:
UserApplication
Over-night
batch
Data
warehouse
Predictive
analyticsDescriptive
analytics
Analytical
model
Monthly
Fast real-time analytics
• ‘Batch’ analytics:
UserApplication
Over-night
batch
Data
warehouse
Predictive
analyticsDescriptive
analytics
Real time decisioning
Monthly
Fast real-time analytics
• So what is real time analytics:
UserApplication
Real time decisioning analytics
Analytical
model
updated in
real time
Fast real-time analytics
• So what is real time analytics:
UserApplication
Real time analytics and decisioning
Analytical
model
updated in
real time
Predictive
analytics
Batch
analytical
model
Real-time
analytical model
Fast real-time analytics
• Q- learning
• E.g. SMS advertisement campaign
Real-time
Analytical
Marketting
System
Location, user info
SMS campaign
Fast real-time analytics
• Q- learning
• E.g. SMS advertisement campaign
Real-time
Analytical
Marketting
System
Change behaviour
(E.g. buy
something else)
Learn new
behaviour
Fast real-time analytics : Real-time analytics in
action
Over time
Interest
in
concerts
Interest
in moviesInterest
in sports
Fast real-time analytics: Real time analytics in
action
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.51
174
347
520
693
866
103
91
21
21
38
51
55
81
73
11
90
42
07
72
25
02
42
32
59
62
76
92
94
23
11
53
28
83
46
13
63
43
80
73
98
04
15
34
32
64
49
94
67
24
84
55
01
85
19
15
36
45
53
75
71
05
88
36
05
66
22
96
40
26
57
56
74
86
92
17
09
47
26
77
44
07
61
37
78
67
95
98
13
28
30
58
47
88
65
18
82
48
99
79
17
09
34
39
51
69
68
99
86
21
0…
10…
10…
10…
10…
10…
INT
ER
ES
T
MESSAGES
SPORTS CONCERTS MOVIES
Interest
in
concerts
Interest
in movies
Interest
in sports
Fast real-time analytics: Real time analytics in
action
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.51
174
347
520
693
866
103
91
21
21
38
51
55
81
73
11
90
42
07
72
25
02
42
32
59
62
76
92
94
23
11
53
28
83
46
13
63
43
80
73
98
04
15
34
32
64
49
94
67
24
84
55
01
85
19
15
36
45
53
75
71
05
88
36
05
66
22
96
40
26
57
56
74
86
92
17
09
47
26
77
44
07
61
37
78
67
95
98
13
28
30
58
47
88
65
18
82
48
99
79
17
09
34
39
51
69
68
99
86
21
0…
10…
10…
10…
10…
10…
INT
ER
ES
T
MESSAGES
SPORTS CONCERTS MOVIES
Interest
in
concerts
Interest
in movies
Interest
in sports
Real time
analytical
tracking and
learning of
people’s
interest
Putting it all together
under one architecture
Data architecture
• Some difficult questions around big data and analytics
– How can I invest in big data while managing cost?
– How can I “experiment” with big data while mitigating risks?
– How can I create a 360 view of data without boiling the ocean?
– How can I use oversea data without violation regulations?
Tiered data architecture
Data warehouse
- Staging
- SQL access
Big Data Infra (E.g. Hadoop)
Data sources Batch
Real-timeReal-time store
Master / Reference Data
Social / Cloud Public Data
Oversea Data
Oversea data
sources
Social
network
Batch
Tiered data architecture
Data
consumer
Data virtualization
SQL /
Rest /
SOAP /
MQ
Data warehouse
- Staging
- SQL access
Big Data Infra (E.g. Hadoop)
Data sources Batch
Real-time Real-time store
Master / Reference Data
Social / Cloud Public Data
Oversea Data
Oversea data
sources
Social
network
Batch
Official data model
Tiered data architecture
• Investment / level of support
Master data
Fast data
Hot data
Cold data
Investment
in CPU /
memory
Investment
in storage
Level 1
Level 1
Level 2
Level 3
Data virtualization Level 1
Level of
support
Tiered data architecture• Invest where it matters
– Defer investment if needed
– Refocus investment without disrupting business
• Data virtualization
– Create a façade for data access
– Provide standard interface for data
– Single data model, single access, single quality checkpoint
• Allow ‘experimentation’
– E.g. cut-off point for hot / cold
• Oversea data access
– Data stays where they are, only aggregated data is transferred back
– More palatable to regulators
• 360 view
– Data can be ‘joined’ through the data virtualization layer – no laborious ETL needed
• Single place to check for data quality
That’s all folks…
• Linkedin:
– https://www.linkedin.com/in/azrul-madisa-6052419