copy for Gary Chin

Master Analytics Data Solution ~ Multiple Channels

DRAFTED in the reigned social domain other than being under further completion

Copy for Mr. Gary Chin, 20150206 prepared by Teng Xiaolu Draw out the analytics framework as, Data Solution = 1. (Statistics Model + Machine Learning) + 2. (Strategy Insights + Metrics Schema + Innovation Tech) On par of the intimidating abundant realms involved, it could be spilt into 2 major parts, Bracket 1. Riding on the foundation of methodology, propagate algorithm techniques and statistical test. Bracket 2. Based on the data responses, decisions led by data analysis could be made through entanglement of insights, measurements, and addictive innovations [fig.3]. Machine Learning In general, at a glance of machine learning (the collections from blog discussions for your best references) who guides a processing to be emphasized: training, tune, test. Basically you have three data sets: training, validation and testing. You train the classifier using 'training set', tune the parameters using 'validation set' and then test the performance of your classifier on unseen 'test set'. Normally, the data size test vs training, I have seen the versions discrepancies, 30%: 70% or 10%: 90%. Probably there is no one way to choose. Is it eliminating bias of classification? Any result in the possible generalization? A well accepted method is N-‐Fold cross validation, in which you randomize the dataset and create N (almost) equal size partitions. Then choose Nth partition for testing and N-‐1 partitions for training the classifier. Within the training set you can further employ another K-‐fold cross validation to create a validation set and find the best parameters. And repeat this process N times to get an average of the metric. Key words: unbias, cross-‐validation, randomized data, average of the metric Strategy Insights + Metrics Schema (Social genre) In the sense of social mining, social channels employ the first approach to fulfill insights sophisticated. Also the best to derive from market distinction. In this first section, it would add on social listening:

Which fans of network are figured out as influenced nodes and how much fraction it takes of the total scale of fans. Especially, these scatters are spilt into the diversified layers of network. How much frequencies the reactions direct to the posts, which could be classified into following volumes in terms of followers size. How it’s to figure out the overlapped area in between of various communities when the common of high interests in hashtags. (none expanding version yet) Against static view on attribute,

Source: MYTH-BUSTING SOCIAL MEDIA ADVERTISING

Source: nielsen-cross-platform-report-march-2014.pdf

fEATURE selection separated, or involved into scenarios. Statistic Model As long as fulfilling the social mining, it’s able to array into a digital model. I would suggest to run out together Logistic Regression, Decision Tree, Neural Network, considering the complementary effect of these 3 functional classifiers. Before, it’s unavoided daunts to discover a certain period who should be used in who among flourished classification techniques. Now it’s able to identify the limitations to be removed in the same time maximize the strengths, for instance, tolerance of missing data is found in decision tree, in the result to tackle the black-box happened in neural network. Nonetheless, this phenomena tends to high allowance on features less restricted, and tolerance to the highly interdependent attributions, it ends to don’t know what to be predicted why it’s predicted. (the collections from blog discussions) for your best references determine the number of neurals

• The VC dimension provides a rule of thumb for the number of neurons. Basically it states that the number of free parameters should be much less than the number of examples in your training set. "Free parameters" translates to the number of connections in your neural net that need to be

tuned, which in a fully connected net depend on the number of neurons and how many of them are in the input layer vs the hidden layer. [1]

• In general, with a large dataset, the more parameters the better. Regularization can prevent overfitting. The structure of the neural net is also critical, and actually determines the number of parameters (which corresponds much more to the number of connections). The most popular architectures these days use many (e.g. 10) "layers" of neurons, and/or feedback connections (see recurrent neural nets, now almost always using LSTM). So in short, #neurons <<< #examples in training set [1] Notice how low-‐dimensional examples becomes a positive thing here

Good to understand neural network THINKING: A hybrid solution is suggested in current version with the paper in [fig.1]. Despite of continuous lacking of evidences to define the how much concerns on the speed of learning and data consumption, in this moment, I would support this operation phasing to none clicks ! clicks.

[fig.1] Neural networks are routinely ignored as a modeling tool because they are largely uninterpretable overall and are generally less familiar to analysts and business people alike. Neural networks can provide great diagnostic insights into the potential shortcomings of other modeling methods, and comparing the results of different models can help identify what is needed to improve model performance.

For example, consider a situation where the best tree model fits poorly, but the best neural network model and the best regression model perform similarly well on the validation data. Had the analyst not considered using a neural network, little performance would be lost by investigating only the regression model. Consider a similar situation where the best tree fits poorly and the best regression fits somewhat better, but the best neural network shows marked improvement over the regression model. The poor tree fit might indicate that the relationship between the predictors and the response changes smoothly. The improvement of the neural network over the regression indicates that the regression model is not capturing the complexity of the relationship between the predictors and the response. Without the neural network results, the regression model would be chosen and much interpretation would go into interpreting a model that inadequately describes the relationship. Even if the neural network is not a candidate to present to the final client or management team, the neural network can be highly diagnostic for other modeling approaches.

In another situation, the best tree model and the best neural network model might be performing well, but the regression model is performing somewhat poorly. In this

case, the relative interpretability of the tree might lead to its selection, but the neural network fit confirms that the tree model adequately summarizes the relationship. In yet another scenario, the tree is performing very well relative to both the neural network and regression models. This scenario might imply that there are certain variables that behave unusually with respect to the response when a missing value is present. Because trees can handle missing values directly, they are able to differentiate between a missing value and a value that has been imputed for use in a regression or neural network model. In this case, it might make more sense to investigate missing value indicators rather than to look at increasing the flexibility of the regression model because the neural network shows that this improved flexibility does not improve the fit.

To overcome this problem, select variables judiciously and fit a neural network while ensuring that there is an adequate amount of data in the validation data set. As discussed earlier, performing variable selection in a variety of ways ensures that important variables are included. Evaluate the models fit by decision tree, regression, and neural network methods to better understand the relationships in the data, and use this information to identify ways to improve the overall fit.

Source: <Identifying and Overcoming Common Data Mining Mistakes>

From the other book,

“However, a neural network is a “black box” method that does not provide any interpetable explanation to accompany its classifications or predictions. Adjusting the parameters to tune the neural network performance is largely a matter of trial and error guided by rules of thumb and user experience.”

{SIDE NOTE}

Inspired by listening ! imitation ! recode, I would like to believe the other tuple required to heterogenic interpretation with discriminant effect. Probably it requires Naïve Bayes and VMC to iterate stringently. Please kindly noted independent fEATURE selection to support formula could be packed into scenarios.

About Naïve Bayes in few paragraphs,

• The second contribution is a technical contribution: We in-‐ troduce a version of Na ̈ıve Bayes with a multivariate event model that can scale up efficiently to massive, sparse datasets. Specifically, this version of the commonly used multivariate Bernoulli Na ̈ıve Bayes only needs to consider the ‘‘active’’ elements of the dataset—those that are present or non-‐zero— which can be a tiny fraction of the elements in the matrix for massive, sparse data. This means that predictive modelers wanting to work with the very convenient Na ̈ıve Bayes algorithm are not forced to use the multinomial event model simply because it is more scalable. This article thereby makes a small but

important addition to the cumulative answer to a current open research question17:

• How can we learn predictive models from lots of data?

• Note that our use of Na ̈ıve Bayes should not be interpreted as a claim that Na ̈ıve Bayes is by any means the best modeling technique for these data. Other methods exist that handle large transactional datasets, such as the popular Vowpal Wabbit software based on scalable stochastic gradient descent and input hashing.2,18,19 Moreover, results based on Na ̈ıve Bayes are conservative. As one would expect theoretically20 and as shown empirically,15 nonlinear modeling and less-‐restrictive linear modeling generally will show continued improvements in predictive performance for much larger datasets than will Na ̈ıve Bayes modeling. (However, how to conduct robust, effective nonlinear modeling with massive high-‐dimensional data is still an open question.) Nevertheless, Na ̈ıve Bayes is popular and quite robust. Using it provides a clear and conservative baseline to demonstrate the point of the article. If we see continued improvements when scaling up Na ̈ıve Bayes to massive data, we should ex-‐ pect even greater improvements when scaling up more sophisticated induction algorithms.

• These results are important because they help provide some solid empirical grounding to the importance of big data for predictive analytics and highlight a particular sort of data in which predictive analytics is likely to benefit from big data. They also add to the observation3 that firms (or other entities) with massive data assets21 may indeed have a considerable competitive advantage over firms with smaller data assets.

Source: big%2E2013%2E0037.pdf <Is Bigger Really Better?>

More discussion from the paper about digital data occurrences, sparse, fine-‐grained, so does massive ! more data actually beats algorithms.

[fig.2] Dynamic Programming Source: https://www.cs.utexas.edu/~eladlieb/RLRG.html

http://theanalyticsstore.com/deep-learning/

INNOVATION is a case combined tech advance TV + Social probably arrives aftermath from the aspect top-to-bottom. I have saw somewhere the similar opinion. Will point out specific article later for your conveniences due to time constrains. I would support to the tool kit, 0. ! -1.

So does real time approach which attempts to roll out under specific real time metrics recency, particularly link to data stream hour, day, week, month. (refer to ++Insight+1++). Functionally it needs parallel to brand metrics, awareness and retention rate. With such, it won’t cover too much about impression, project ROI, est. cost of prospect gaining = estimated margin per prospect / (1+ROI threshold) or CLTV (new deducted from existing) in the monothetic statistical test, including a longer list p value, F test, t-Test, R2, adjusted R2, correlation matrix, elasticity and co-efficiency functionally validate, type I/II errors. MAPE, error rate management, ROC, lift depended on the model selection and more likelihood time series, association, what-if. (note: longer in book(s), thicker+thicker, every fraction, self-semester) There is analytics session named transaction analysis, RFM discerning acquisition ! transaction. It illustrates the possibilities, in conditional setting, the clicks but none-purchase might belong to the group who stimulates to longer relationship with brands who offer coupons, probably the variation of against cost sensitivity. A model helps to recognize this type of existence with parameters. In the contrast, purchased group alternately to be encouraged for repeat purchase due to shifted demand from the analysis cross-sell and up-sell. Upstream and downstream both could be spread. How far from the overarching digital intrinsic relevance, by channel or by touchpoint? TV, plus time shifted TV diminishing, even capping 2, on or off should leave off this motion in a certain scenario, for instance customer scoring program. Social network analysis plays throughout the scenarios estimating customer profitability, listening ! imitation ! recode, on the path of being extrapolation, both supervised and un-supervised.

Source: Bayesian+reasoning+and+machine+learning.pdf NEWS, In particular, they want to see highly granular data from all touchpoints. "Increasing the granularity and variability of media inputs can increase the estimate of a medium's RoI by as much as 27%," they reported. They also highlighted the "shocking oversight" when it comes to measuring creativity, with some observers claiming that 70% of the sales effectiveness of advertising can be attributed to the creative message. Acknowledging that this is a difficult area, they argued that more direct integration of copy tests into marketing mix models would move the industry on from determining which ads worked to understanding why they worked.

Source: New marketing models emerge, London: 6 February 2015

http://www.warc.com/LatestNews/News/EmailNews.news?ID=34271&Origin=WARCNewsEmail&CID=N34271&PUB=Warc_News&utm_source=WarcNews&utm_medium=email&utm_campaign=WarcNews20150206

Other gifts given from London

What Winston said other else? :>>

“During these turbulent times, predictive analytics is how smart companies are turning data into knowledge to gain a competitive advantage.” Source: <Drive your business with predictive analytics>

Source: <Drive your business with predictive analytics> THINKING from Facebook case: it might be two-ways TV doesn’t simply play as domination to influence social responses in trend, in the contradict, social platform reflects TV opportunities that Facebook leverages on Super Bowl significance. It’s typical event show pattern vs proportion vs longer viewingship extension added with transaction history to capture higher value customer. [fig.3]

• January 30, 2015, 1:53 PM • Facebook’s new Super Bowl ad play • By Zak Stambor Managing Editor • The social network will launch a live feed where fans can discuss the game,

and it is selling video ads that target consumers based on what they talk about. Among those signing up to advertise are Toyota, Pepsi, Intuit TurboTax and Anheuser-‐Busch.

• Facebook Inc. wants to be on consumers’ second screen during the Super Bowl.

• The social network will launch a Super Bowl-‐specific feed during the game where consumers can comment on the game—and the surrounding hoopla around it, including ads. And advertisers can target consumers within the feed based on what participants are discussing.

• Among the brands that plan to advertise within Facebook’s feed are Toyota, Pepsi, Intuit TurboTax and Anheuser-‐Busch. Each of those brands is also running ads during the game’s TV broadcast.

• Using Facebook, as well as other digital channels, to amplify a costly ad buy is an essential part of advertising strategy in today’s media climate, says

Rebecca Lieb, an analyst at the business research and advisory firm Altimeter Group.

• “Brands are in a position where making corresponding web and social ad buys is de rigueur,” she says. “Why would you invest all the time and money in a Super Bowl ad and give it the lifespan of a fruit fly by letting it begin and end on broadcast TV?”

• This year 30 seconds of Super Bowl air time costs advertisers $4.5 million, according to Variety. That doesn’t begin to factor in production costs, which can also be extremely costly, Lieb says.

• In addition to letting large advertisers amplify their Super Bowl campaigns, the feed will also let smaller marketers, including e-‐retailers, use attention-‐grabbing ads to be a part of consumers’ Super Bowl discussion, says Lou Kerner, a social media analyst and investor at The Social Internet Fund.

• While Twitter is often thought of as the social network consumers engage with while watching TV, its audience is roughly one-‐fifth the size of Facebook’s, Lieb says. Twitter has 284 million monthly active users—and only 63 million in the United States—compared to Facebook, which has 1.393 billion monthly active users, including 208 million in the United States and Canada (Facebook doesn’t release a U.S.-‐only figure).

• “There’s never been a medium as big as Facebook,” Lieb says. “Now clearly not all of Facebook’s users are Americans, not all of those American users are football fans, but there are millions and millions of people who represent a very large potential audience for advertisers,” she says. While TV gives advertisers a tool to reach a wide swath of consumers, Facebook gives them an even bigger audience that they can finely target, she says.

• Facebook recognizes this and is emphasizing to potential advertisers that, in addition to football fans, they can reach people discussing party planning, sharing recipes, buying a new flat-‐screen TV, the half-‐time show or chattering about ads, a spokeswoman says. Facebook declined to say what it is charging marketers to advertise in the Super Bowl feed.

• While 115 million U.S. consumers watched the Super Bowl last year, Facebook says 170 million people saw Super Bowl-‐related posts and ads last year. By developing a dedicated feed, Facebook aims to grow that number.

Source: https://www.internetretailer.com/2015/01/30/facebooks-‐new-‐super-‐bowl-‐ad-‐play

++Insight+1++ from Nielsen, Spredfast, Rentrak:

We also know that 40% of U.S. tablet and smartphone users visit a social network while watching TV. Five of the top 10 primetime TV shows integrate social media online and/or on-‐ air: NBC Sunday Night Football, both nights of The Voice, and both nights of X Factor. In addition, Spredfast reaches 135 million people each week through our on-‐air social visuali-‐ zations, which is 40% of the U.S. population. Rentrak’s scale allows us to sell on cycles up to 28 days for most shows because we have tremendous coverage across users.

Reading for more references: Nielsen-cross-platform-report-march-2014.pdf Do display ad influence search.pdf Tech Trends 2014 Inspiring Disruption – Deloitte.pdf Accenture_Technology_Vision_2014.pdf Social_Shopping_2011_Brief1.pdf Social_Media_Analytics_-_Sample_report_-_Marketing_effectiveness.pdf 13926_di_social_q413_v5.pdf

Documents

copy for Gary Chin