13
Master Analytics Data Solution ~ Multiple Channels DRAFTED in the reigned social domain other than being under further completion Copy for Mr. Gary Chin, 20150206 prepared by Teng Xiaolu Draw out the analytics framework as, Data Solution = 1. (Statistics Model + Machine Learning) + 2. (Strategy Insights + Metrics Schema + Innovation Tech) On par of the intimidating abundant realms involved, it could be spilt into 2 major parts, Bracket 1. Riding on the foundation of methodology, propagate algorithm techniques and statistical test. Bracket 2. Based on the data responses, decisions led by data analysis could be made through entanglement of insights, measurements, and addictive innovations [fig.3]. Machine Learning In general, at a glance of machine learning (the collections from blog discussions for your best references) who guides a processing to be emphasized: training, tune, test. Basically you have three data sets: training, validation and testing. You train the classifier using 'training set', tune the parameters using 'validation set' and then test the performance of your classifier on unseen 'test set'. Normally, the data size test vs training, I have seen the versions discrepancies, 30%: 70% or 10%: 90%. Probably there is no one way to choose. Is it eliminating bias of classification? Any result in the possible generalization? A well accepted method is NFold cross validation, in which you randomize the dataset and create N (almost) equal size partitions. Then choose Nth partition for testing and N1 partitions for training the classifier. Within the training set you can further employ another Kfold cross validation to create a validation set and find the best parameters. And repeat this process N times to get an average of the metric. Key words: unbias, crossvalidation, randomized data, average of the metric Strategy Insights + Metrics Schema (Social genre) In the sense of social mining, social channels employ the first approach to fulfill insights sophisticated. Also the best to derive from market distinction. In this first section, it would add on social listening:

copy for Gary Chin

Embed Size (px)

Citation preview

Page 1: copy for Gary Chin

Master Analytics Data Solution ~ Multiple Channels

DRAFTED in the reigned social domain other than being under further completion

Copy for Mr. Gary Chin, 20150206 prepared by Teng Xiaolu Draw out the analytics framework as, Data Solution = 1. (Statistics Model + Machine Learning) + 2. (Strategy Insights + Metrics Schema + Innovation Tech) On par of the intimidating abundant realms involved, it could be spilt into 2 major parts, Bracket 1. Riding on the foundation of methodology, propagate algorithm techniques and statistical test. Bracket 2. Based on the data responses, decisions led by data analysis could be made through entanglement of insights, measurements, and addictive innovations [fig.3]. Machine Learning In general, at a glance of machine learning (the collections from blog discussions for your best references) who guides a processing to be emphasized: training, tune, test.  Basically  you  have  three  data  sets:  training,  validation  and  testing.  You  train  the  classifier  using  'training  set',  tune  the  parameters  using  'validation  set'  and  then  test  the  performance  of  your  classifier  on  unseen  'test  set'.  Normally,  the  data  size  test  vs  training,  I  have  seen  the  versions  discrepancies,  30%:  70%  or  10%:  90%.  Probably  there  is  no  one  way  to  choose.  Is  it  eliminating  bias  of  classification?  Any  result  in  the  possible  generalization?      A  well  accepted  method  is  N-­‐Fold  cross  validation,  in  which  you  randomize  the  dataset  and  create  N  (almost)  equal  size  partitions.  Then  choose  Nth  partition  for  testing  and  N-­‐1  partitions  for  training  the  classifier.  Within  the  training  set  you  can  further  employ  another  K-­‐fold  cross  validation  to  create  a  validation  set  and  find  the  best  parameters.  And  repeat  this  process  N  times  to  get  an  average  of  the  metric.    Key  words:  unbias,  cross-­‐validation,  randomized  data,  average  of  the  metric    Strategy Insights + Metrics Schema (Social genre) In the sense of social mining, social channels employ the first approach to fulfill insights sophisticated. Also the best to derive from market distinction. In this first section, it would add on social listening:

Page 2: copy for Gary Chin

Which fans of network are figured out as influenced nodes and how much fraction it takes of the total scale of fans. Especially, these scatters are spilt into the diversified layers of network. How much frequencies the reactions direct to the posts, which could be classified into following volumes in terms of followers size. How it’s to figure out the overlapped area in between of various communities when the common of high interests in hashtags. (none expanding version yet) Against static view on attribute,

Source: MYTH-BUSTING SOCIAL MEDIA ADVERTISING

Page 3: copy for Gary Chin

Source: nielsen-cross-platform-report-march-2014.pdf

fEATURE selection separated, or involved into scenarios. Statistic Model As long as fulfilling the social mining, it’s able to array into a digital model. I would suggest to run out together Logistic Regression, Decision Tree, Neural Network, considering the complementary effect of these 3 functional classifiers. Before, it’s unavoided daunts to discover a certain period who should be used in who among flourished classification techniques. Now it’s able to identify the limitations to be removed in the same time maximize the strengths, for instance, tolerance of missing data is found in decision tree, in the result to tackle the black-box happened in neural network. Nonetheless, this phenomena tends to high allowance on features less restricted, and tolerance to the highly interdependent attributions, it ends to don’t know what to be predicted why it’s predicted. (the collections from blog discussions) for your best references determine the number of neurals  

• The  VC  dimension  provides  a  rule  of  thumb  for  the  number  of  neurons.  Basically  it  states  that  the  number  of  free  parameters  should  be  much  less  than  the  number  of  examples  in  your  training  set.  "Free  parameters"  translates  to  the  number  of  connections  in  your  neural  net  that  need  to  be  

Page 4: copy for Gary Chin

tuned,  which  in  a  fully  connected  net  depend  on  the  number  of  neurons  and  how  many  of  them  are  in  the  input  layer  vs  the  hidden  layer.  [1]  

• In  general,  with  a  large  dataset,  the  more  parameters  the  better.    Regularization  can  prevent  overfitting.        The  structure  of  the  neural  net  is  also  critical,  and  actually  determines  the  number  of  parameters  (which  corresponds  much  more  to  the  number  of  connections).        The  most  popular  architectures  these  days  use  many  (e.g.  10)  "layers"  of  neurons,  and/or  feedback  connections  (see  recurrent  neural  nets,  now  almost  always  using  LSTM).    So  in  short,  #neurons  <<<  #examples  in  training  set      [1]  Notice  how  low-­‐dimensional  examples  becomes  a  positive  thing  here  

Good to understand neural network  THINKING:  A  hybrid  solution  is  suggested  in  current  version  with  the  paper  in  [fig.1].  Despite  of  continuous  lacking  of  evidences  to  define  the  how  much  concerns  on  the  speed  of  learning  and  data  consumption,  in  this  moment,  I  would  support  this  operation  phasing  to  none  clicks  !  clicks.  

[fig.1] Neural  networks  are  routinely  ignored  as  a  modeling  tool  because  they  are  largely  uninterpretable  overall  and  are  generally  less  familiar  to  analysts  and  business  people  alike.  Neural  networks  can  provide  great  diagnostic  insights  into  the  potential  shortcomings  of  other  modeling  methods,  and  comparing  the  results  of  different  models  can  help  identify  what  is  needed  to  improve  model  performance.  

For  example,  consider  a  situation  where  the  best  tree  model  fits  poorly,  but  the  best  neural  network  model  and  the  best  regression  model  perform  similarly  well  on  the  validation  data.  Had  the  analyst  not  considered  using  a  neural  network,  little  performance  would  be  lost  by  investigating  only  the  regression  model.  Consider  a  similar  situation  where  the  best  tree  fits  poorly  and  the  best  regression  fits  somewhat  better,  but  the  best  neural  network  shows  marked  improvement  over  the  regression  model.  The  poor  tree  fit  might  indicate  that  the  relationship  between  the  predictors  and  the  response  changes  smoothly.  The  improvement  of  the  neural  network  over  the  regression  indicates  that  the  regression  model  is  not  capturing  the  complexity  of  the  relationship  between  the  predictors  and  the  response.  Without  the  neural  network  results,  the  regression  model  would  be  chosen  and  much  interpretation  would  go  into  interpreting  a  model  that  inadequately  describes  the  relationship.  Even  if  the  neural  network  is  not  a  candidate  to  present  to  the  final  client  or  management  team,  the  neural  network  can  be  highly  diagnostic  for  other  modeling  approaches.  

In  another  situation,  the  best  tree  model  and  the  best  neural  network  model  might  be  performing  well,  but  the  regression  model  is  performing  somewhat  poorly.  In  this  

Page 5: copy for Gary Chin

case,  the  relative  interpretability  of  the  tree  might  lead  to  its  selection,  but  the  neural  network  fit  confirms  that  the  tree  model  adequately  summarizes  the  relationship.  In  yet  another  scenario,  the  tree  is  performing  very  well  relative  to  both  the  neural  network  and  regression  models.  This  scenario  might  imply  that  there  are  certain  variables  that  behave  unusually  with  respect  to  the  response  when  a  missing  value  is  present.  Because  trees  can  handle  missing  values  directly,  they  are  able  to  differentiate  between  a  missing  value  and  a  value  that  has  been  imputed  for  use  in  a  regression  or  neural  network  model.  In  this  case,  it  might  make  more  sense  to  investigate  missing  value  indicators  rather  than  to  look  at  increasing  the  flexibility  of  the  regression  model  because  the  neural  network  shows  that  this  improved  flexibility  does  not  improve  the  fit.  

To  overcome  this  problem,  select  variables  judiciously  and  fit  a  neural  network  while  ensuring  that  there  is  an  adequate  amount  of  data  in  the  validation  data  set.  As  discussed  earlier,  performing  variable  selection  in  a  variety  of  ways  ensures  that  important  variables  are  included.  Evaluate  the  models  fit  by  decision  tree,  regression,  and  neural  network  methods  to  better  understand  the  relationships  in  the  data,  and  use  this  information  to  identify  ways  to  improve  the  overall  fit.  

Source:  <Identifying  and  Overcoming  Common  Data  Mining  Mistakes>  

From  the  other  book,  

“However,  a  neural  network  is  a  “black  box”  method  that  does  not  provide  any  interpetable  explanation  to  accompany  its  classifications  or  predictions.  Adjusting  the  parameters  to  tune  the  neural  network  performance  is  largely  a  matter  of  trial  and  error  guided  by  rules  of  thumb  and  user  experience.”  

{SIDE  NOTE}  

Inspired  by  listening  !  imitation  !  recode,  I  would  like  to  believe  the  other  tuple  required  to  heterogenic  interpretation  with  discriminant  effect.  Probably  it  requires  Naïve  Bayes  and  VMC  to  iterate  stringently.  Please  kindly  noted  independent  fEATURE  selection  to  support  formula  could  be  packed  into  scenarios.  

 

About  Naïve  Bayes  in  few  paragraphs,  

• The  second  contribution  is  a  technical  contribution:  We  in-­‐  troduce  a  version  of  Na  ̈ıve  Bayes  with  a  multivariate  event  model  that  can  scale  up  efficiently  to  massive,  sparse  datasets.  Specifically,  this  version  of  the  commonly  used  multivariate  Bernoulli  Na  ̈ıve  Bayes  only  needs  to  consider  the  ‘‘active’’  elements  of  the  dataset—those  that  are  present  or  non-­‐zero—  which  can  be  a  tiny  fraction  of  the  elements  in  the  matrix  for  massive,  sparse  data.  This  means  that  predictive  modelers  wanting  to  work  with  the  very  convenient  Na  ̈ıve  Bayes  algorithm  are  not  forced  to  use  the  multinomial  event  model  simply  because  it  is  more  scalable.  This  article  thereby  makes  a  small  but  

Page 6: copy for Gary Chin

important  addition  to  the  cumulative  answer  to  a  current  open  research  question17:    

• How  can  we  learn  predictive  models  from  lots  of  data?    

• Note  that  our  use  of  Na  ̈ıve  Bayes  should  not  be  interpreted  as  a  claim  that  Na   ̈ıve  Bayes  is  by  any  means  the  best  modeling  technique  for  these  data.  Other  methods  exist  that  handle  large  transactional  datasets,  such  as  the  popular  Vowpal  Wabbit  software  based  on  scalable  stochastic  gradient  descent  and  input  hashing.2,18,19  Moreover,  results  based  on  Na  ̈ıve  Bayes  are  conservative.  As  one  would  expect  theoretically20  and  as  shown  empirically,15  nonlinear  modeling  and  less-­‐restrictive  linear  modeling  generally  will  show  continued  improvements  in  predictive  performance  for  much  larger  datasets  than  will  Na  ̈ıve  Bayes  modeling.  (However,  how  to  conduct  robust,  effective  nonlinear  modeling  with  massive  high-­‐dimensional  data  is  still  an  open  question.)  Nevertheless,  Na  ̈ıve  Bayes  is  popular  and  quite  robust.  Using  it  provides  a  clear  and  conservative  baseline  to  demonstrate  the  point  of  the  article.  If  we  see  continued  improvements  when  scaling  up  Na  ̈ıve  Bayes  to  massive  data,  we  should  ex-­‐  pect  even  greater  improvements  when  scaling  up  more  sophisticated  induction  algorithms.    

• These  results  are  important  because  they  help  provide  some  solid  empirical  grounding  to  the  importance  of  big  data  for  predictive  analytics  and  highlight  a  particular  sort  of  data  in  which  predictive  analytics  is  likely  to  benefit  from  big  data.  They  also  add  to  the  observation3  that  firms  (or  other  entities)  with  massive  data  assets21  may  indeed  have  a  considerable  competitive  advantage  over  firms  with  smaller  data  assets.  

Source:  big%2E2013%2E0037.pdf  <Is  Bigger  Really  Better?>    

More  discussion  from  the  paper  about  digital  data  occurrences,  sparse,  fine-­‐grained,  so  does  massive  !  more  data  actually  beats  algorithms.    

[fig.2] Dynamic Programming Source: https://www.cs.utexas.edu/~eladlieb/RLRG.html

http://theanalyticsstore.com/deep-learning/

Page 7: copy for Gary Chin

     INNOVATION is a case combined tech advance TV + Social probably arrives aftermath from the aspect top-to-bottom. I have saw somewhere the similar opinion. Will point out specific article later for your conveniences due to time constrains. I would support to the tool kit, 0. ! -1.

Page 8: copy for Gary Chin

So does real time approach which attempts to roll out under specific real time metrics recency, particularly link to data stream hour, day, week, month. (refer to ++Insight+1++). Functionally it needs parallel to brand metrics, awareness and retention rate. With such, it won’t cover too much about impression, project ROI, est. cost of prospect gaining = estimated margin per prospect / (1+ROI threshold) or CLTV (new deducted from existing) in the monothetic statistical test, including a longer list p value, F test, t-Test, R2, adjusted R2, correlation matrix, elasticity and co-efficiency functionally validate, type I/II errors. MAPE, error rate management, ROC, lift depended on the model selection and more likelihood time series, association, what-if. (note: longer in book(s), thicker+thicker, every fraction, self-semester) There is analytics session named transaction analysis, RFM discerning acquisition ! transaction. It illustrates the possibilities, in conditional setting, the clicks but none-purchase might belong to the group who stimulates to longer relationship with brands who offer coupons, probably the variation of against cost sensitivity. A model helps to recognize this type of existence with parameters. In the contrast, purchased group alternately to be encouraged for repeat purchase due to shifted demand from the analysis cross-sell and up-sell. Upstream and downstream both could be spread. How far from the overarching digital intrinsic relevance, by channel or by touchpoint? TV, plus time shifted TV diminishing, even capping 2, on or off should leave off this motion in a certain scenario, for instance customer scoring program. Social network analysis plays throughout the scenarios estimating customer profitability, listening ! imitation ! recode, on the path of being extrapolation, both supervised and un-supervised.

Page 9: copy for Gary Chin

Source: Bayesian+reasoning+and+machine+learning.pdf NEWS, In  particular,  they  want  to  see  highly  granular  data  from  all  touchpoints.  "Increasing  the  granularity  and  variability  of  media  inputs  can  increase  the  estimate  of  a  medium's  RoI  by  as  much  as  27%,"  they  reported.    They  also  highlighted  the  "shocking  oversight"  when  it  comes  to  measuring  creativity,  with  some  observers  claiming  that  70%  of  the  sales  effectiveness  of  advertising  can  be  attributed  to  the  creative  message.    Acknowledging  that  this  is  a  difficult  area,  they  argued  that  more  direct  integration  of  copy  tests  into  marketing  mix  models  would  move  the  industry  on  from  determining  which  ads  worked  to  understanding  why  they  worked.  

Source:  New  marketing  models  emerge,    London:  6  February  2015  

http://www.warc.com/LatestNews/News/EmailNews.news?ID=34271&Origin=WARCNewsEmail&CID=N34271&PUB=Warc_News&utm_source=WarcNews&utm_medium=email&utm_campaign=WarcNews20150206          

Page 10: copy for Gary Chin

Other gifts given from London

What Winston said other else? :>>

“During these turbulent times, predictive analytics is how smart companies are turning data into knowledge to gain a competitive advantage.” Source: <Drive your business with predictive analytics>

Page 11: copy for Gary Chin

Source: <Drive your business with predictive analytics> THINKING from Facebook case: it might be two-ways TV doesn’t simply play as domination to influence social responses in trend, in the contradict, social platform reflects TV opportunities that Facebook leverages on Super Bowl significance. It’s typical event show pattern vs proportion vs longer viewingship extension added with transaction history to capture higher value customer. [fig.3]

• January  30,  2015,  1:53  PM  • Facebook’s  new  Super  Bowl  ad  play  • By  Zak  Stambor  Managing  Editor    • The  social  network  will  launch  a  live  feed  where  fans  can  discuss  the  game,  

and  it  is  selling  video  ads  that  target  consumers  based  on  what  they  talk  about.  Among  those  signing  up  to  advertise  are  Toyota,  Pepsi,  Intuit  TurboTax  and  Anheuser-­‐Busch.  

• Facebook  Inc.  wants  to  be  on  consumers’  second  screen  during  the  Super  Bowl.  

• The  social  network  will  launch  a  Super  Bowl-­‐specific  feed  during  the  game  where  consumers  can  comment  on  the  game—and  the  surrounding  hoopla  around  it,  including  ads.  And  advertisers  can  target  consumers  within  the  feed  based  on  what  participants  are  discussing.  

• Among  the  brands  that  plan  to  advertise  within  Facebook’s  feed  are  Toyota,  Pepsi,  Intuit  TurboTax  and  Anheuser-­‐Busch.  Each  of  those  brands  is  also  running  ads  during  the  game’s  TV  broadcast.  

• Using  Facebook,  as  well  as  other  digital  channels,  to  amplify  a  costly  ad  buy  is  an  essential  part  of  advertising  strategy  in  today’s  media  climate,  says  

Page 12: copy for Gary Chin

Rebecca  Lieb,  an  analyst  at  the  business  research  and  advisory  firm  Altimeter  Group.  

• “Brands  are  in  a  position  where  making  corresponding  web  and  social  ad  buys  is  de  rigueur,”  she  says.  “Why  would  you  invest  all  the  time  and  money  in  a  Super  Bowl  ad  and  give  it  the  lifespan  of  a  fruit  fly  by  letting  it  begin  and  end  on  broadcast  TV?”  

• This  year  30  seconds  of  Super  Bowl  air  time  costs  advertisers  $4.5  million,  according  to  Variety.  That  doesn’t  begin  to  factor  in  production  costs,  which  can  also  be  extremely  costly,  Lieb  says.  

• In  addition  to  letting  large  advertisers  amplify  their  Super  Bowl  campaigns,  the  feed  will  also  let  smaller  marketers,  including  e-­‐retailers,  use  attention-­‐grabbing  ads  to  be  a  part  of  consumers’  Super  Bowl  discussion,  says  Lou  Kerner,  a  social  media  analyst  and  investor  at  The  Social  Internet  Fund.  

• While  Twitter  is  often  thought  of  as  the  social  network  consumers  engage  with  while  watching  TV,  its  audience  is  roughly  one-­‐fifth  the  size  of  Facebook’s,  Lieb  says.  Twitter  has  284  million  monthly  active  users—and  only  63  million  in  the  United  States—compared  to  Facebook,  which  has  1.393  billion  monthly  active  users,  including  208  million  in  the  United  States  and  Canada  (Facebook  doesn’t  release  a  U.S.-­‐only  figure).  

• “There’s  never  been  a  medium  as  big  as  Facebook,”  Lieb  says.  “Now  clearly  not  all  of  Facebook’s  users  are  Americans,  not  all  of  those  American  users  are  football  fans,  but  there  are  millions  and  millions  of  people  who  represent  a  very  large  potential  audience  for  advertisers,”  she  says.  While  TV  gives  advertisers  a  tool  to  reach  a  wide  swath  of  consumers,  Facebook  gives  them  an  even  bigger  audience  that  they  can  finely  target,  she  says.  

• Facebook  recognizes  this  and  is  emphasizing  to  potential  advertisers  that,  in  addition  to  football  fans,  they  can  reach  people  discussing  party  planning,  sharing  recipes,  buying  a  new  flat-­‐screen  TV,  the  half-­‐time  show  or  chattering  about  ads,  a  spokeswoman  says.  Facebook  declined  to  say  what  it  is  charging  marketers  to  advertise  in  the  Super  Bowl  feed.  

• While  115  million  U.S.  consumers  watched  the  Super  Bowl  last  year,  Facebook  says  170  million  people  saw  Super  Bowl-­‐related  posts  and  ads  last  year.  By  developing  a  dedicated  feed,  Facebook  aims  to  grow  that  number.  

 Source:  https://www.internetretailer.com/2015/01/30/facebooks-­‐new-­‐super-­‐bowl-­‐ad-­‐play  

 

++Insight+1++  from  Nielsen,  Spredfast,  Rentrak:  

We  also  know  that  40%  of  U.S.  tablet  and  smartphone  users  visit  a  social  network  while  watching  TV.  Five  of  the  top  10  primetime  TV  shows  integrate  social  media  online  and/or  on-­‐  air:  NBC  Sunday  Night  Football,  both  nights  of  The  Voice,  and  both  nights  of  X  Factor.  In  addition,  Spredfast  reaches  135  million  people  each  week  through  our  on-­‐air  social  visuali-­‐  zations,  which  is  40%  of  the  U.S.  population.  Rentrak’s  scale  allows  us  to  sell  on  cycles  up  to  28  days  for  most  shows  because  we  have  tremendous  coverage  across  users.  

Page 13: copy for Gary Chin

 Reading for more references: Nielsen-cross-platform-report-march-2014.pdf Do display ad influence search.pdf Tech Trends 2014 Inspiring Disruption – Deloitte.pdf Accenture_Technology_Vision_2014.pdf Social_Shopping_2011_Brief1.pdf Social_Media_Analytics_-_Sample_report_-_Marketing_effectiveness.pdf 13926_di_social_q413_v5.pdf