Big Data : Bits of History, Words of Advice

Venu Vasudevan

GLSEC Big Data Meetup

Big Data PastBig

intelligent mediaIoT

satellites

Big Data : Behavioral

Big Data

- The ‘V’ view of Big Data challenges - Number of V’s up for debate

Big Data : Architectural

untidy data

firehose

clean analytics

fast & good

slower & much better

Lambda architecture

Lake architecture

Stream architecture

Technical

This Talk

Behavioral View

Technology Solution

‘Middleware’ (benefit of hindsight)

some more some

governance culture (gap)

data economics

data eco

3 data pointsBig

intelligent mediaIoT

satellites

Iridium

• mobile routers (10K mph), fixed people

• no repeated patterns

• satellites N-S movement

• earth E-W movement

• regular topology, irregular exceptions

• solar flares

• military satellite presence

Fast Data Problem

• cellular frequency allocation (graph coloring problem)

• frequent fast recalculations (fast routers + semi-fast earth)

• transmit-no transmit (solar flares, military satellite presence)

• moving ‘seam’

Fast Data Problem

• cellular frequency allocation (graph coloring problem)

• frequent fast recalculations (fast routers + semi-fast earth)

• transmit-no transmit (solar flares, military satellite presence)

• moving ‘seam’

• + ‘France’

ities broadcast

= +$$$

broadcast = -$$$ (lawsuit)

Fast Data Problem

• quest for (OO)DB technology to address ‘France’ as make-or-break use case

• query expressive power

• complex constraint satisfaction

• query handling throughput

• 3-4 month benchmarking effort

broadcast = +$$$

broadcast = -$$$ (lawsuit)

Fast Data Problem

• quest for (OO)DB technology to address ‘France’

• query expressive power

• query handling throughput

• 3-4 month benchmarking effort

• France solved ‘out-of-band’ (legally)

broadcast = +$$$

broadcast = -$$$ (lawsuit)don’t overfit your architecture to

an extreme requirement

unless it’s from an extreme (paying) user

Big Data Problem

• systems management

• manage 66 ‘nodes’

• nodes moving at 10K mph

• ‘seam’ moving of 20K mph

• sounds harder than trivial, but not too hard

‘Pre’ Lambda Solution

• Dumb edge | smart core approach

• 15K events/sec/satellite

• 1M events/sec

• Fast & Approximate - FMEA: ’compiled’ lookup table for failure modes

• Slow & Precise - Model-based reasoning on satellite models

untidy satellite firehose

(1M events/sec)

actionable insights

‘Pre’ Lambda architecture

Model-Based Reasoning

‘Pre’ Lambda Solution

• Dumb edge | smart core approach

• 15K events/sec/satellite

• Fast & Approximate - FMEA: ’compiled’ lookup table for failure modes

• Slow & Precise - Model-based reasoning on satellite models

• Simple, straightforward & wrong.

untidy satellite firehose

(1M events/sec)

actionable insights

‘Pre’ Lambda architecture

Model-Based Reasoning

real-time expert system

Yet, an architecture that is

‘rinsed and repeated’

over the years

why does dumb edge smart cloud endure?

• edges are expensive ($2B)

• when edges go wrong (break/blow up /collide) , they make headlines

why dumb edge smart cloud

• when edges go wrong (break/blow up /collide) and make headlines

• nobody messes with an ‘edge’ once it works

• clouds don’t make for good news headlines

$$$$$ T-30 yrs

why dumb edge smart cloud

• when edges go wrong (break/blow up /collide) and make news headlines

• nobody messes with an ‘edge’ once it works

• thus, implementing an end-to-end architecture causes culture clashes

over my dead body

iterate & refine

an almost repeat (Industrial IoT)

• edges are messy & domain specific

• creating them means dealing with culture clashes

• but .. an ounce of edge is worth a pound of cloud

$$$$$ T-30 yrs

Things to consider• Problem statement. What’s your ‘France’?

• colorful sub-problem. strategy overfit.

• Architecture. small fixes to IT/OT gap can go a long way to a simpler problem

• Technology Choices. best practices & the risk of ‘rewardless risk’

• right - make average programmers productive with new tech

• frequent - turn great programmers into average

Big Data to Deep Metadata

streaming video(TV) ~ 1 petabyte/day

second

minute

day/week

epochal

detect & replace ads

Create Playlists by Player,

Play, Sentiment

Identify minor characters with rabid fan following

rejuvenate old content

derive new

content

‘chapterize’ by Player,

Play, Sentiment

Platform Triage Challenge new Product, new market

• one core technology, many markets

• platform triaging challenge. what drives the platform?

• highest (but uncertain) $ potential?

• ‘extreme’ requirement?

• sparsest competition?

• use case outlier is your biggest customer

deep metadata

technology

SaaS data

platform

Advertising

Search

Video concept

ad replacement use case• speed

• few days (on-demand content)

• few seconds (real-time rebroadcast with new ads)

• precision

• low - best effort, for low cost international content for niche audiences

• high - frame level for expensive content. e.g. Sports/$10M/episode programming

• errors

• 90% accuracy - ok for long tail content

• ‘five nines’ for premium content

precision accu

ad replacement opportunity space

largest customer

occam’s razor works (again)

• build to simplicity

• loose coupling between data engg & equipment engg

• modularize complexity

• ‘differentiate your product’ changes

• ‘necessary evil’ changes

data-only approach

+1st party integration (dynamically configure

ad splicers)

3rd party knobs (dynamically refresh CDN)

Architecture

but, what if ..

• Data is untidy

• Interpretation is subjective/cultural

• Automation is aspirational but quixotic

human-powered analytics

• some analytics tasks are too ‘slippery’ for machines

• data hard to characterize

• uneven video quality of ‘old’ archives

• untidy

• insights are subjective

human-powered analytics

• some analytics tasks are too ‘slippery’ for machines

• need for human augmentation

• humans generate ‘training’ sets to bootstrap m/c learning

• humans completely take over some tasks

machines vs humans

• crowdsourcing & human-powered computing

• has been the ‘next big thing’ for a while

• checkered history:

• uneven output

• fraud

• uneven throughput

Machines Humans

fast slow

brittle malleable

objective subjective

clear nuanced

machines vs humans

• much of that has changed

• Amazon Mech Turk

• 500K active users

• the ‘human machine’ can return substantial jobs in under 30 mins

• quantifiable as a machine for many media tasks - latency, quality, error rate, thruput

Hybrid Architecture

Things to consider• Beware ‘France’ in other forms:

• customer with loudest voice & ‘holy grail’ hairball

• Dealing with data quality & variability

• crowdsourcing has come a long way as credible ‘engine’

• If big data the answer, what is the question? (have strong opinion held weakly)

• decision rationalization

• process automation

• human ‘power tool’ (e.g. compelling visualization) vs imperfect automation

startup data jiu-jitsu

• How to create a data-driven strategy before the data shows up?

• rationalize future SaaS revenue models

• justify product decisions in a data-driven manner

need data for product

need product for data

startup data jiu-jitsu

• How to create a data-driven strategy before the data shows up?

• how ‘intelligent’ can lighting control be with 50-100K users?

• how do people use dimmers (continuous or quantized) — UX implications

data set dilemma

• standard sources (e.g. Kaggle & UCI) insufficient

• few ‘physical world’ datasets

• expensive to collect

• may be specialized (vendor-specific)

• dataset proxies for IoT actuation may not work

• energy utilization != switch usage

big data, small start

• physical world data likely to be smaller (1-10 homes, few months)

• setup costs limit size of public datasets

• e.g. UMass Smart* light switch dataset

big data, small start

• consider data ‘augmentation’

• standard practice in AI (deep learning) - horizontally flipping, random crops …

• under-used in data space

• may need some thought on perturbation models for your domain

synthesized

https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html

In short ..

• big data success - equal parts tech & non-tech

• solving right problem, not just problem right

• revisit problem, and what success means

@venuv62 venu.vasudevan@nextio.co

Big Data : Bits of History, Words of Advice

Internet

4G bits DDR3 (1.5V) SDRAM Specification · 4G bits DDR3 Synchronous DRAM Revision 1.3 Page 2 / 146 May, 2017 Specifications Density: 4G bits Organization 64M words x 8 bits x 8 banks

Words With Recruiters. Anecdotes & Advice on How to Get Noticed

Words of Wisdom: 11 Drummers Share Their Best Advice for Beginners

Comprehension (Culture) Answer Key. 1) What two words were used by the author to define culture? (1) The words are ‘values’ and ‘norms’. Advice: Students

Hardware-configuration – PCS 7€¦ · Web viewAnalog signals are usually organized in words (16 bits). Each analog input or output signal occupies 16 bits of memory, as shown in

WORDS OF ADVICE from the Franchising Experts · ebook to learn some tips on choosing a brand, ... freelancers, and companies of all ... WORDS OF ADVICE FROM THE FRANCHISING EXPERTS

A few words of advice about mailing envelopes

Bits of Wise Words

Summarized Words of Advice Concerning Holding onto the

Geoffrey Cann and raChael Goydan BITS BYTES AND BARRELS€¦ · Geoffrey Cann and raChael Goydan BITS , BYTES , AND BARRELS The Digital Transformation of Oil and Gas “Valuable advice

System Reserved Words/Bits - · PDF fileSystem Reserved Words/Bits 2 ... Retentive Bit Index RBI 0 ~ 65535f DDDDDh ... [LB-11958: time setting error] will be set ON,

16G bits DDR2 Mobile RAM TM PoP (14.0mm 14.0mm, 220-ball … · 2020. 7. 19. · • Density: 16G bits • Organization: — 4 pieces of 4Gb (16M words 32 bits 8 banks) in one package

Words of advice Regarding Da'wah [Shaykh ibn Baaz]

6 Bits 5 Bits 5 Bits 5 Bits 5 Bits 6 Bits

srammhanis/lecture/sram.pdf · 2002. 5. 6. · M bits N Words Word 0 Word 1 Word 2 Word N-1 Word N-2 Input-Output (M bits) Storage Cell M bits Decoder A0 A1 ... DRAM Timing SRAM Timing

Investing Advice from Warren Buffett’s Right-Hand …docshare01.docshare.tips/files/28567/285672089.pdfInvesting Advice from Warren Buffett’s Right-Hand Man Words of wisdom from

Words of Advice For Artists From Jean Dubuffet, Advocate for Art Brut, on His Birthday

Modeling Data in Formal Verification Bits, Bit Vectors, or Words

4News - VisionAmp · Susannah your “Bits” each month. Keep it short and sweet... preferably 25 words or less! Bits ‘N’ Pieces The all new Mountain Country 94 is now on the

Fast Random Integer Generation in an IntervalPseudo-random values are usually generated in words of a fixed number of bits (e.g., 32 bits, 64 bits) using algorithms such as a linear