16
Linguisc Challenges Associated with Monitoring Social Media Dave Linabury, Jason Macemore, Gary Olson Campbell-Ewald | October, 2010

Linguistic challenges associated with monitoring social media

Embed Size (px)

DESCRIPTION

Social media monitoring tools such as Radian6, Sysomos and Scout have tremendous capabilities for pulling in highly targeted conversations taking place around a topic, person or brand from all social media platforms. These tools also enable a corporation or researcher to find the influencers around a topic. While the targeting potential is astonishing, variations in languages, slang, regional idioms, misspellings and nicknames for topics and brands make accurate targeting difficult. What’s more, the influencers around a brand or topic are the most likely to use a nickname, slang term or personal parlance known only to their social circle.

Citation preview

Page 1: Linguistic challenges associated with monitoring social media

Linguistic Challenges Associated with

Monitoring Social Media

Dave Linabury Jason Macemore Gary Olson Campbell-Ewald | October 2010

Copyright copy2010 Linabury Macemore and Olson Edited by Christopher Moritz and Helena Dobbins All rights reserved

Radian6 Chevrolet OnStar the United States Navy and Google retain the copyrights of their respective corporations Campbell-Ewald uses the services of both Radian6 and Google along with internally devel-oped proprietary set of tools for gathering data on social media for its clients as well as its own internal data Campbell-Ewald part of the Interpublic Group of agencies has been in the business of marketing communications and strategy since 1911

2

Executive SummarySocial media monitoring tools such as Radian6 Sysomos and Scout have tremendous capabilities for pulling in highly targeted conversations taking place around a topic person or brand from all social media platforms These tools also enable a corporation or researcher to find the influencers around a topic

While the targeting potential is astonishing variations in languages slang regional idioms misspellings and nicknames for topics and brands make accurate targeting difficult Whatrsquos more the influencers around a brand or topic are the most likely to use a nickname slang term or personal parlance known only to their social circle This makes understanding of these language variants critical

Campbell-Ewaldrsquos Social Media team has addressed these linguistic challenges in their own client monitoring projects over the past five years by

bull Determiningcurrentsearchtrends around a topic This identifies not only how users are searching (which indicates intent) but aids in the identification of misspellings and relevant associated topics

bull Determiningtheageandgenderofthewriter through the use of various tools knowledge of generational writing patterns and comparing regional variations against reliable reference sources of slang

bull Identifyingtheinfluencers and recording their linguistic patterns

bull Identifyingemoticonsand comparing them to known regional and generational variants

This paper will detail these challengesmdashwhich are largely unknown to most users of these monitoring toolsmdashin hopes that their own monitoring will be more accurate and complete

3

BackgroundHistoryCampbell-Ewald has been an active participant in social media since early 2006 Their lead social media planners Dave Linabury and Jason Macemore were among the first to develop social media monitoring tools such as Fat Pipe and Sentimentor It was the development of these early tools mdash created to meet their own needs as researchers mdash that led to understanding the challenges raised in this paper

Linabury and Macemore quickly discovered that monitoring tools were not always able to spider all of the conversations that were known to exist At first they theorized that conversations werenrsquot being pulled in because different coding methods and naming conventions for Web site sections made it difficult for the tools to parse data

As new technologies made parsing data easier the initial theory proved to be an incorrect assessment It was ascertained in mid-2007 that linguistic variants were the cause

SuccessesSince the discovery of the linguistic variant sets Campbell-Ewald has become the nationrsquos leader in social media monitoring They have been tasked with monitoring data for several United States government agencies including the United States Navy the United States Mint the United States Naval Academy the FBI the Center for Disease Control (CDC)and the Environmental Protection Agency (EPA)

In addition to government clients and projects Campbell-Ewaldrsquos social media team under the leadership of Linabury also provides monitoring for dozens of Fortune 500 clients while garnering numerous awards such as a Gold Echo Award a Silver Effie Best Military Site of 2009 and Best Social Media Strategy among others

4

Target MarketBased on usage trends the target audience for social media monitoring applications can be divided into two main segments internalandexternal

Categorically corporations and public relations firms tend to use monitoring tools for internally-driven ends These typically include reputation management crisis management and as a clipping service to capture media mentions

Keyword strategies for these approaches are typically limited to formal brand names the CEOrsquos name and associated marketing terminology They rarely take into consideration linguistic variations context or subtle sentiment variations

Inversely advertising agencies researchers and social media agencies tend to retain an external focus to their monitoring efforts concentrating on sentiment analysis brand perception and marketing effectiveness and awareness

External monitoring tends to consider contextual relevance far more than PR firms do but most still lack the incorporation of (or even existence of) language variants that need to be considered for accurate and inclusive brand monitoring in the social space

5

Business ChallengeThe User Base Grows AnnuallyNearly 40 of corporations are turning to social monitoring to keep abreast of whatrsquos being said about their brand Many take this on internally but most hire outside social media companies or agencies However virtually none of them are aware that they are not seeing the entire conversation and blindly put faith in their chosen monitoring tool that it will fulfill their needs and find all of the relevant online discussions about their brand product or services

This is not the case The tools are limited by the thoroughness of the toolrsquos operator and how much time is spent determining appropriate keywords Most administrators make the assumption that the terms they use as marketing descriptors (eg marketing copy search terms and PR copy) are enough

Many monitoring tools are set up for the corporation by the tool manufacturer It is highly unlikely the tool creator could understand a brand as well as the employees agencies or long-standing vendors of the corporation

The reality is the marketing descriptors are generally one-sided somewhat aspirational and rarely match customer expectations and perceptions Few companies use keywords describing themselves as ldquocheaprdquo ldquoaveragerdquo ldquoacceptablerdquo ldquopoorrdquo ldquopatheticrdquo ldquogood enoughrdquo etcrdquo however those are precisely the terms consumers use with respect to brands For proof of this one need only see how brands are described at BrandTagsnet where tens of thousands of consumers have used those exact terms to describe hundreds of corporations in ever-growing tag clouds of user-generated terms

6

Tagcloud about Chrysler from BrandTagsnet

The Problem with Monitoring LanguageLanguages constantly evolve They evolve nationally regionally and hyper-locally For example a popular phrase among teens nationally to describe something amazing is ldquooff the hookrdquo Regional variants such as ldquooff the chainrdquo [Detroit] and ldquooff the heezyrdquo [Brooklyn] exist as well Hyper-locally a neighborhood may have yet another variant shared among friends but not generally known outside that block

This presents unique challenges to the researcher who is using social media monitoring tools If a phrase is known it will be used as a key search term for the tool to use If however more people are using lesser known regional variants the tool loses effectiveness

1337speakThere exist several linguistic phenomenon online that do not exist offline One is the well-known variant known as hacker speak or ldquo1337speakrdquo (Elite speak) This variation goes back more than a decade online It was developed by computer hackers in an effort to make their messages to each other difficult to read by outsiders Words are deconstructed to their visual elements and replaced with alpha-numeric and punctuation equivalents that bear a passing resemblance to the original letter form

For example a capital lsquoTrsquo may be replaced with the number 7 or a + sign The word lsquoatrsquo will be replaced with the sign Capital lsquoErsquo becomes a 3 and so on There is no sequencing to the replacements it is simply a matter of finding letters numbers and punctuation that can be substituted Indeed cleverness is praised and while online ldquo1337speak generatorsrdquo exist which ldquotranslaterdquo text back and forth between English and 1337speak each hacker has her own style of writing and will make personal substitutions that others may or may not choose to adopt

Here is a sample sentence in English first then 1337speak ldquoTime Magazinersquos reporter had no idea what we were afterrdquo ldquo71M3 M464z1n3rsquo5 |23p0|273|2 H4| n0 1|34 wH47 w3 w3|23 4f73|2rdquo

7

If hackers were discussing a new Intel processor in 1337speak no monitoring tool would be able to pick up that conversation as no complete English words exist in hacker speak for the tool to pick up

LOLCATSSites such as General Mayhem 4chanorg and ICANHASCHEEZBURGER are responsible for spreading one of the more popular slang variants known as LOLCATS (pronounced ldquoLAHL catsrdquo) The meme originated as a series of cute pictures of kittens doing things with the accompanying text purported to be the voice of the cat Cats according to the meme have unique spellings of English poor grammar and prefer the ldquoImpactrdquo font Eventually kids began using LOLCATS as an accepted form of writing in text messages instant messages email and even speech

Like 1337speak LOLCATS speak can be difficult if not impossible for monitoring tools to parse as plain English Consider the sentence used for the 1337speak example in English then in LOLCATS

ldquoTime Magazinersquos reporter had no idea what we were afterrdquo ldquoTIEM MAGAZEENZ REPORTR IZ R NO IDEAZ WUT WE R AFTERZrdquo

Intentional MisspellingsFinally Generation Y general do not spell correctly sometimes out of laziness sometimes mdash like hackers mdashto intentionally disguise their messages from authority figures This may not matter to a company monitoring the conversations of senior citizens but if the target audience is the highly sought after 18-24 crowd it is an issue that must be understood Here is a real example found on MySpace from a 16 year-old girl to her friends

ldquoHAY GUISE LOL WUT CHARGIN LAZOR LOLZ SHOOP DA WHOOP THIS KID TOOK MY LUNCH MONEY CALL HIM AND SAY BAD THINGS HERES HIS NUMBER LOLZ 696 696 6969 BUT BECAREFUL HE DOSNT AFRAID OF ANYTHINGrdquo

8

Generational Differences in EmoticonsMicroblogging platforms like Twitter and Foursquare which necessitate short messaging seem almost devoid of emoticons It is our theory that hashtagsmdashshort linked codes preceded by the pound sign ()mdashtake the place of emoticons on microblogging as many hashtags are used sarcastically such as whatever or ilovemylife

There are distinct differences between the types of emoticons created by the different birth generations in the United States Notice that with each generation the ldquofacesrdquo become slightly more realistic

bull The so-called SilentGeneration (1925-1945) are the least likely to use emoticons in speech other than the most basic (smiles and frowns)

bull The BabyBoomers (1946-1963) use emoticons sparingly but nevertheless use more than just -) and -( symbols They will include others such as - (unsure) -O (surprised) and -) (wink) Notice the addition of a nose formed with the hyphen key

bull GenerationXers (1964-1980) use the most emoticons of the older three generations They include unusual emoticons such as gt--(deggt (dead fish) and ^p (sticking out tongue) even emoticons meant sexually such as (o) for breasts Noses are often present usually with a carat ^ in place of a hyphen although hyphens are prevalent as well

bull It is with GenerationY (1981-2000) that we see the greatest change in emoticons where the ldquofacesrdquo move from sideways to forward facing taken from the Japanese kaomoji Compare the symbol for wink between Generations X and Y ^) and (0_-)

9

Silent Generation Wink

)Baby Boomer Wink

-)Generation X Wink

^)Generation Y Wink

(O_-)

Gender Analysis Gender Analysis may be unfamiliar to most and many may question even why it is necessary The reason is simple Comments may arise where either the screen name of the writer is ambiguous or the writing style of a known individual seems to drastically change suddenly In the latter case there is the distinct possibility of profile fraud

Some individuals may pretend to be the opposite gender for various reasons to pretend to be another person for a prank to assume the identity of another for fraudulent reasons to pretend to be the opposite gender for sexual reasons to pretend to be another for undercover work as in vice-squad or detective work

SolutionsRelying solely on internal industry and marketing keywords will not suffice It is crucial to take additional steps

The following sources be used to determine additional relevant keywords

bull TheUrbanDictionary httpurbandictionarycom Continually updated the Urban Dictionary is easily the largest source of regional national and international slang on the Internet Excellent for typing in industry terms to see if variations exist and regionally where they are used

bull GoogleInsightshttpgooglecominsightssearch Google Insights allow searches to go from global down to individual cities with timeframes from the last 30 days as far back as 2004 They provide trends on rising search patterns based on the root key term maps indicating geo-density forecasts and news headlines plotted on trend lines

10

bull GoogleAdWords httpadwordsgooglecom AdWords is a free tool from Google designed to assist companies in making better choices when selecting keywords for paid search buys The tool can also be used to help select better keywords for social media monitoring Keywords are shown by the latest search patterns with search quantities displayed

bull InfluencersAsk active and influential customers for terms nicknames etc If your company does not have a personal relationship with its influencers find and read their blogs and tweets paying close attention to the responses from their audiences Flag unusual words spellings and abbreviations

bull GenderGeniehttp bookblognetgendergeniephp A free tool that can identify the gender of the writer by pasting text into a field and running the algorithm

With these additional keywords misspellings slang nicknames and regional variants the new keyword list will not only yield more data but will finally tell the whole consumer story surrounding the brand

BenefitsIt is no longer an option to be naiumlve enough to actually believe that no one is talking All brands are being discussed by someone Only through the proper configuration of professional-grade monitoring tools like Radian6mdashand preferably under the guidance of a social media agency that specializes in monitoring and analysismdashcan a company expect to truly know what is being said about their brand

Not knowing how your brand is being discussed and described means that brand is not getting the entire picture as is the case with the reports from PR agencies and most internal social media monitoring

By applying these techniques and using these additional tools a brand can be certain of seeing the full picture and glean far more learnings from their customer base

11

Case Study Chevrolet CobaltThe phenomenon of linguistic variants was first noticed and described by Linabury and Macemore in 2007 to General Motors while they were monitoring conversations pertaining to the Chevrolet Cobaltmdasha small car that young males were customizingmdashalong with Honda Accordsmdashinto street rods (known regionally as Rice Rods Rice Burners Rice Rockets etc) The assignment was to find out what these young men were saying about the Cobalt as they were deemed by Chevrolet to be influencers to non-Chevrolet owners

Campbell-Ewaldrsquos monitoring was confined geographically to the Great Lakes states During the course of the monitoring Macemore noticed that some of the Chicago and Ohio conversations in forums were referring to the Cobalt as a ldquoBaltrdquo Linabury noticed that conversations on the West side of Michigan referred to it as a ldquoC-Carrdquo or ldquoC-Baltrdquo C-Car was the internal name of the vehicle used by engineers but in Michigan (where the car is produced) it is possible that engineering names are known externally

Macemore then theorized that these terms were surfac-ing enough that they should be added to the keywords the monitoring tool was using to spider conversations After adding the new terms the number of conversations found by the tool increased by 53 This led to speculation that theinfluentialmembersofasocialcirclemaybemorelikelytohaveinternalnicknamesthanthoseoutsidethatcircle and that these names needed to be identified at the outset of any social media monitoring assignment to en-sure accurate monitoring and the largest possible data set

Result By adding the additional terms that were manually identified the conversational data set increased by more than 50 and the client gained insight and learnings into how their vehicles were referred to by the most influential purchasers of their product

12

Case Study OnStartradeOnStartrade is a multimillion dollar company that produces a telematics system for vehicles As the system is responsible for saving the lives of hundreds of people involved in motor vehicle accidents OnStartradersquos corporate marketing team wanted up to the minute reports on what their subscribers were saying their detractors and the media In 2007 OnStartrade hired Campbell-Ewaldrsquos Social Media Team to monitor conversations and report back with weekly findings and daily with any outstanding conversations or topics

Campbell-Ewaldrsquos Social Media Team quickly discovered there would be a few barriers to accurate monitoring For example people discussing certain television shows were appearing in the feed Sentences like ldquoDid you see what happened onStar Search last nightrdquo or ldquoThere was one episode onStar Trek wherehelliprdquo These false positives were quickly weeded out through exclusionary phrases added to the keyword set

The team also discovered linguistic variants of OnStartrade appearing in the conversations of loyal fans and influencers which included several hackers Some hackers were tweaking OnStartrade at home (similar to the jail-breaking of iPhones) for fun We found that they used numerous variants of OnStartrade including On On Star On_Star NOnStar ONStar OffStar On-Star OnsStar and BlondeStar (in reference to a YouTube parody of OnStartrade)

Result By adding the additional terms that were manually identified the conversational data set increased by more than 109 and the client gained insight and learnings into how OnStar was being referred to by the most influential purchasers of their product and by an unexpected fan base hackers

13

Technical SpecsAssigning new keywords to any social monitoring tool is simple Finding the keywords is the challenge The following demonstration shows how to add new keywords to an existing set using the popular social media monitoring tool Radian6

In this example the new Dell Mini 3 cellphone has been chosen as a topic to monitor Narrowing the feed to cell phones and removing ldquonoiserdquo about Dell laptops makes the results more accurate

By adding the keyword lsquocellphonersquo and the exclusionary keyword lsquolaptoprsquo the feed examples are more targeted

A search on Google Insights for lsquoDell Mini 3rsquo shows us that consumers are also searching for it as a lsquocellular dellrsquo lsquodell androidrsquo lsquodell android phonersquo lsquodell smartphonersquo and lsquodell mini 5rsquo (a different model)

A look at the Urban Dictionary indicates any cellphone may be referred to as a ldquocellierdquo by youth

These additional keywords (except perhaps the Mini 5) should be added to Radian6rsquos keywords as they represent the intent of users That these keywords are listed by Google as ldquoBreakoutsrdquo is significant breakouts represent a recent increase in search volume of more than 5000

Radian6

Radian6

Google Insights

Urban Dictionary

14

SummaryCampbell-Ewald has been an active participant in social media since early 2006 Their lead social media researchers Dave Linabury and Jason Macemore were among the first to develop social media monitoring software tools It was through the development of these early tools that were created to meet their own needs as researchers that led to understanding the linguistic challenges raised in this paper

Campbell-Ewaldrsquos Social Media team addressed these linguistic challenges in their own client monitoring projects over the past five years utilizing the following approaches

bull Determining current search trends around a topic

bull Determining the age and gender of the writer

bull Identifying the Influencers and recording their linguistic patterns

bull Identifying emoticons and comparing them to known regional and generational variants

It is critical in monitoring to understand that internal marketing descriptors and paid search terms are not enough to effectively crawl all of the conversations taking place around a brand Nor is it enough to rely on basic tools like Google Alerts Accurate monitoring is done with professional grade tools like Radian6 under the guidance of experienced monitoring teams like those at Campbell-Ewald

The monitor must use the Urban Dictionary to determine any industry or brand slang check Google AdWords for misspellings and current search trends and check Google Insights for regional interest Finally the researcher must either directly contact influential fans of the brand or failing that spend time reading blog posts by influencers and responses to their content from their audience Only then can a keyword set be considered accurate and comprehensive

15

ContactDaveLinabury Group Director Social MediaDaveLinaburyc-ecom

JasonMacemore Digital StrategistJasonMacemorec-ecom

GaryOlson Senior Social Media PlannerGaryOlsonc-ecom

Campbell-Ewald30400 Van Dyke Ave Warren Michigan 48093 +1 (586) 574-3400 httpc-ecom

Page 2: Linguistic challenges associated with monitoring social media

Copyright copy2010 Linabury Macemore and Olson Edited by Christopher Moritz and Helena Dobbins All rights reserved

Radian6 Chevrolet OnStar the United States Navy and Google retain the copyrights of their respective corporations Campbell-Ewald uses the services of both Radian6 and Google along with internally devel-oped proprietary set of tools for gathering data on social media for its clients as well as its own internal data Campbell-Ewald part of the Interpublic Group of agencies has been in the business of marketing communications and strategy since 1911

2

Executive SummarySocial media monitoring tools such as Radian6 Sysomos and Scout have tremendous capabilities for pulling in highly targeted conversations taking place around a topic person or brand from all social media platforms These tools also enable a corporation or researcher to find the influencers around a topic

While the targeting potential is astonishing variations in languages slang regional idioms misspellings and nicknames for topics and brands make accurate targeting difficult Whatrsquos more the influencers around a brand or topic are the most likely to use a nickname slang term or personal parlance known only to their social circle This makes understanding of these language variants critical

Campbell-Ewaldrsquos Social Media team has addressed these linguistic challenges in their own client monitoring projects over the past five years by

bull Determiningcurrentsearchtrends around a topic This identifies not only how users are searching (which indicates intent) but aids in the identification of misspellings and relevant associated topics

bull Determiningtheageandgenderofthewriter through the use of various tools knowledge of generational writing patterns and comparing regional variations against reliable reference sources of slang

bull Identifyingtheinfluencers and recording their linguistic patterns

bull Identifyingemoticonsand comparing them to known regional and generational variants

This paper will detail these challengesmdashwhich are largely unknown to most users of these monitoring toolsmdashin hopes that their own monitoring will be more accurate and complete

3

BackgroundHistoryCampbell-Ewald has been an active participant in social media since early 2006 Their lead social media planners Dave Linabury and Jason Macemore were among the first to develop social media monitoring tools such as Fat Pipe and Sentimentor It was the development of these early tools mdash created to meet their own needs as researchers mdash that led to understanding the challenges raised in this paper

Linabury and Macemore quickly discovered that monitoring tools were not always able to spider all of the conversations that were known to exist At first they theorized that conversations werenrsquot being pulled in because different coding methods and naming conventions for Web site sections made it difficult for the tools to parse data

As new technologies made parsing data easier the initial theory proved to be an incorrect assessment It was ascertained in mid-2007 that linguistic variants were the cause

SuccessesSince the discovery of the linguistic variant sets Campbell-Ewald has become the nationrsquos leader in social media monitoring They have been tasked with monitoring data for several United States government agencies including the United States Navy the United States Mint the United States Naval Academy the FBI the Center for Disease Control (CDC)and the Environmental Protection Agency (EPA)

In addition to government clients and projects Campbell-Ewaldrsquos social media team under the leadership of Linabury also provides monitoring for dozens of Fortune 500 clients while garnering numerous awards such as a Gold Echo Award a Silver Effie Best Military Site of 2009 and Best Social Media Strategy among others

4

Target MarketBased on usage trends the target audience for social media monitoring applications can be divided into two main segments internalandexternal

Categorically corporations and public relations firms tend to use monitoring tools for internally-driven ends These typically include reputation management crisis management and as a clipping service to capture media mentions

Keyword strategies for these approaches are typically limited to formal brand names the CEOrsquos name and associated marketing terminology They rarely take into consideration linguistic variations context or subtle sentiment variations

Inversely advertising agencies researchers and social media agencies tend to retain an external focus to their monitoring efforts concentrating on sentiment analysis brand perception and marketing effectiveness and awareness

External monitoring tends to consider contextual relevance far more than PR firms do but most still lack the incorporation of (or even existence of) language variants that need to be considered for accurate and inclusive brand monitoring in the social space

5

Business ChallengeThe User Base Grows AnnuallyNearly 40 of corporations are turning to social monitoring to keep abreast of whatrsquos being said about their brand Many take this on internally but most hire outside social media companies or agencies However virtually none of them are aware that they are not seeing the entire conversation and blindly put faith in their chosen monitoring tool that it will fulfill their needs and find all of the relevant online discussions about their brand product or services

This is not the case The tools are limited by the thoroughness of the toolrsquos operator and how much time is spent determining appropriate keywords Most administrators make the assumption that the terms they use as marketing descriptors (eg marketing copy search terms and PR copy) are enough

Many monitoring tools are set up for the corporation by the tool manufacturer It is highly unlikely the tool creator could understand a brand as well as the employees agencies or long-standing vendors of the corporation

The reality is the marketing descriptors are generally one-sided somewhat aspirational and rarely match customer expectations and perceptions Few companies use keywords describing themselves as ldquocheaprdquo ldquoaveragerdquo ldquoacceptablerdquo ldquopoorrdquo ldquopatheticrdquo ldquogood enoughrdquo etcrdquo however those are precisely the terms consumers use with respect to brands For proof of this one need only see how brands are described at BrandTagsnet where tens of thousands of consumers have used those exact terms to describe hundreds of corporations in ever-growing tag clouds of user-generated terms

6

Tagcloud about Chrysler from BrandTagsnet

The Problem with Monitoring LanguageLanguages constantly evolve They evolve nationally regionally and hyper-locally For example a popular phrase among teens nationally to describe something amazing is ldquooff the hookrdquo Regional variants such as ldquooff the chainrdquo [Detroit] and ldquooff the heezyrdquo [Brooklyn] exist as well Hyper-locally a neighborhood may have yet another variant shared among friends but not generally known outside that block

This presents unique challenges to the researcher who is using social media monitoring tools If a phrase is known it will be used as a key search term for the tool to use If however more people are using lesser known regional variants the tool loses effectiveness

1337speakThere exist several linguistic phenomenon online that do not exist offline One is the well-known variant known as hacker speak or ldquo1337speakrdquo (Elite speak) This variation goes back more than a decade online It was developed by computer hackers in an effort to make their messages to each other difficult to read by outsiders Words are deconstructed to their visual elements and replaced with alpha-numeric and punctuation equivalents that bear a passing resemblance to the original letter form

For example a capital lsquoTrsquo may be replaced with the number 7 or a + sign The word lsquoatrsquo will be replaced with the sign Capital lsquoErsquo becomes a 3 and so on There is no sequencing to the replacements it is simply a matter of finding letters numbers and punctuation that can be substituted Indeed cleverness is praised and while online ldquo1337speak generatorsrdquo exist which ldquotranslaterdquo text back and forth between English and 1337speak each hacker has her own style of writing and will make personal substitutions that others may or may not choose to adopt

Here is a sample sentence in English first then 1337speak ldquoTime Magazinersquos reporter had no idea what we were afterrdquo ldquo71M3 M464z1n3rsquo5 |23p0|273|2 H4| n0 1|34 wH47 w3 w3|23 4f73|2rdquo

7

If hackers were discussing a new Intel processor in 1337speak no monitoring tool would be able to pick up that conversation as no complete English words exist in hacker speak for the tool to pick up

LOLCATSSites such as General Mayhem 4chanorg and ICANHASCHEEZBURGER are responsible for spreading one of the more popular slang variants known as LOLCATS (pronounced ldquoLAHL catsrdquo) The meme originated as a series of cute pictures of kittens doing things with the accompanying text purported to be the voice of the cat Cats according to the meme have unique spellings of English poor grammar and prefer the ldquoImpactrdquo font Eventually kids began using LOLCATS as an accepted form of writing in text messages instant messages email and even speech

Like 1337speak LOLCATS speak can be difficult if not impossible for monitoring tools to parse as plain English Consider the sentence used for the 1337speak example in English then in LOLCATS

ldquoTime Magazinersquos reporter had no idea what we were afterrdquo ldquoTIEM MAGAZEENZ REPORTR IZ R NO IDEAZ WUT WE R AFTERZrdquo

Intentional MisspellingsFinally Generation Y general do not spell correctly sometimes out of laziness sometimes mdash like hackers mdashto intentionally disguise their messages from authority figures This may not matter to a company monitoring the conversations of senior citizens but if the target audience is the highly sought after 18-24 crowd it is an issue that must be understood Here is a real example found on MySpace from a 16 year-old girl to her friends

ldquoHAY GUISE LOL WUT CHARGIN LAZOR LOLZ SHOOP DA WHOOP THIS KID TOOK MY LUNCH MONEY CALL HIM AND SAY BAD THINGS HERES HIS NUMBER LOLZ 696 696 6969 BUT BECAREFUL HE DOSNT AFRAID OF ANYTHINGrdquo

8

Generational Differences in EmoticonsMicroblogging platforms like Twitter and Foursquare which necessitate short messaging seem almost devoid of emoticons It is our theory that hashtagsmdashshort linked codes preceded by the pound sign ()mdashtake the place of emoticons on microblogging as many hashtags are used sarcastically such as whatever or ilovemylife

There are distinct differences between the types of emoticons created by the different birth generations in the United States Notice that with each generation the ldquofacesrdquo become slightly more realistic

bull The so-called SilentGeneration (1925-1945) are the least likely to use emoticons in speech other than the most basic (smiles and frowns)

bull The BabyBoomers (1946-1963) use emoticons sparingly but nevertheless use more than just -) and -( symbols They will include others such as - (unsure) -O (surprised) and -) (wink) Notice the addition of a nose formed with the hyphen key

bull GenerationXers (1964-1980) use the most emoticons of the older three generations They include unusual emoticons such as gt--(deggt (dead fish) and ^p (sticking out tongue) even emoticons meant sexually such as (o) for breasts Noses are often present usually with a carat ^ in place of a hyphen although hyphens are prevalent as well

bull It is with GenerationY (1981-2000) that we see the greatest change in emoticons where the ldquofacesrdquo move from sideways to forward facing taken from the Japanese kaomoji Compare the symbol for wink between Generations X and Y ^) and (0_-)

9

Silent Generation Wink

)Baby Boomer Wink

-)Generation X Wink

^)Generation Y Wink

(O_-)

Gender Analysis Gender Analysis may be unfamiliar to most and many may question even why it is necessary The reason is simple Comments may arise where either the screen name of the writer is ambiguous or the writing style of a known individual seems to drastically change suddenly In the latter case there is the distinct possibility of profile fraud

Some individuals may pretend to be the opposite gender for various reasons to pretend to be another person for a prank to assume the identity of another for fraudulent reasons to pretend to be the opposite gender for sexual reasons to pretend to be another for undercover work as in vice-squad or detective work

SolutionsRelying solely on internal industry and marketing keywords will not suffice It is crucial to take additional steps

The following sources be used to determine additional relevant keywords

bull TheUrbanDictionary httpurbandictionarycom Continually updated the Urban Dictionary is easily the largest source of regional national and international slang on the Internet Excellent for typing in industry terms to see if variations exist and regionally where they are used

bull GoogleInsightshttpgooglecominsightssearch Google Insights allow searches to go from global down to individual cities with timeframes from the last 30 days as far back as 2004 They provide trends on rising search patterns based on the root key term maps indicating geo-density forecasts and news headlines plotted on trend lines

10

bull GoogleAdWords httpadwordsgooglecom AdWords is a free tool from Google designed to assist companies in making better choices when selecting keywords for paid search buys The tool can also be used to help select better keywords for social media monitoring Keywords are shown by the latest search patterns with search quantities displayed

bull InfluencersAsk active and influential customers for terms nicknames etc If your company does not have a personal relationship with its influencers find and read their blogs and tweets paying close attention to the responses from their audiences Flag unusual words spellings and abbreviations

bull GenderGeniehttp bookblognetgendergeniephp A free tool that can identify the gender of the writer by pasting text into a field and running the algorithm

With these additional keywords misspellings slang nicknames and regional variants the new keyword list will not only yield more data but will finally tell the whole consumer story surrounding the brand

BenefitsIt is no longer an option to be naiumlve enough to actually believe that no one is talking All brands are being discussed by someone Only through the proper configuration of professional-grade monitoring tools like Radian6mdashand preferably under the guidance of a social media agency that specializes in monitoring and analysismdashcan a company expect to truly know what is being said about their brand

Not knowing how your brand is being discussed and described means that brand is not getting the entire picture as is the case with the reports from PR agencies and most internal social media monitoring

By applying these techniques and using these additional tools a brand can be certain of seeing the full picture and glean far more learnings from their customer base

11

Case Study Chevrolet CobaltThe phenomenon of linguistic variants was first noticed and described by Linabury and Macemore in 2007 to General Motors while they were monitoring conversations pertaining to the Chevrolet Cobaltmdasha small car that young males were customizingmdashalong with Honda Accordsmdashinto street rods (known regionally as Rice Rods Rice Burners Rice Rockets etc) The assignment was to find out what these young men were saying about the Cobalt as they were deemed by Chevrolet to be influencers to non-Chevrolet owners

Campbell-Ewaldrsquos monitoring was confined geographically to the Great Lakes states During the course of the monitoring Macemore noticed that some of the Chicago and Ohio conversations in forums were referring to the Cobalt as a ldquoBaltrdquo Linabury noticed that conversations on the West side of Michigan referred to it as a ldquoC-Carrdquo or ldquoC-Baltrdquo C-Car was the internal name of the vehicle used by engineers but in Michigan (where the car is produced) it is possible that engineering names are known externally

Macemore then theorized that these terms were surfac-ing enough that they should be added to the keywords the monitoring tool was using to spider conversations After adding the new terms the number of conversations found by the tool increased by 53 This led to speculation that theinfluentialmembersofasocialcirclemaybemorelikelytohaveinternalnicknamesthanthoseoutsidethatcircle and that these names needed to be identified at the outset of any social media monitoring assignment to en-sure accurate monitoring and the largest possible data set

Result By adding the additional terms that were manually identified the conversational data set increased by more than 50 and the client gained insight and learnings into how their vehicles were referred to by the most influential purchasers of their product

12

Case Study OnStartradeOnStartrade is a multimillion dollar company that produces a telematics system for vehicles As the system is responsible for saving the lives of hundreds of people involved in motor vehicle accidents OnStartradersquos corporate marketing team wanted up to the minute reports on what their subscribers were saying their detractors and the media In 2007 OnStartrade hired Campbell-Ewaldrsquos Social Media Team to monitor conversations and report back with weekly findings and daily with any outstanding conversations or topics

Campbell-Ewaldrsquos Social Media Team quickly discovered there would be a few barriers to accurate monitoring For example people discussing certain television shows were appearing in the feed Sentences like ldquoDid you see what happened onStar Search last nightrdquo or ldquoThere was one episode onStar Trek wherehelliprdquo These false positives were quickly weeded out through exclusionary phrases added to the keyword set

The team also discovered linguistic variants of OnStartrade appearing in the conversations of loyal fans and influencers which included several hackers Some hackers were tweaking OnStartrade at home (similar to the jail-breaking of iPhones) for fun We found that they used numerous variants of OnStartrade including On On Star On_Star NOnStar ONStar OffStar On-Star OnsStar and BlondeStar (in reference to a YouTube parody of OnStartrade)

Result By adding the additional terms that were manually identified the conversational data set increased by more than 109 and the client gained insight and learnings into how OnStar was being referred to by the most influential purchasers of their product and by an unexpected fan base hackers

13

Technical SpecsAssigning new keywords to any social monitoring tool is simple Finding the keywords is the challenge The following demonstration shows how to add new keywords to an existing set using the popular social media monitoring tool Radian6

In this example the new Dell Mini 3 cellphone has been chosen as a topic to monitor Narrowing the feed to cell phones and removing ldquonoiserdquo about Dell laptops makes the results more accurate

By adding the keyword lsquocellphonersquo and the exclusionary keyword lsquolaptoprsquo the feed examples are more targeted

A search on Google Insights for lsquoDell Mini 3rsquo shows us that consumers are also searching for it as a lsquocellular dellrsquo lsquodell androidrsquo lsquodell android phonersquo lsquodell smartphonersquo and lsquodell mini 5rsquo (a different model)

A look at the Urban Dictionary indicates any cellphone may be referred to as a ldquocellierdquo by youth

These additional keywords (except perhaps the Mini 5) should be added to Radian6rsquos keywords as they represent the intent of users That these keywords are listed by Google as ldquoBreakoutsrdquo is significant breakouts represent a recent increase in search volume of more than 5000

Radian6

Radian6

Google Insights

Urban Dictionary

14

SummaryCampbell-Ewald has been an active participant in social media since early 2006 Their lead social media researchers Dave Linabury and Jason Macemore were among the first to develop social media monitoring software tools It was through the development of these early tools that were created to meet their own needs as researchers that led to understanding the linguistic challenges raised in this paper

Campbell-Ewaldrsquos Social Media team addressed these linguistic challenges in their own client monitoring projects over the past five years utilizing the following approaches

bull Determining current search trends around a topic

bull Determining the age and gender of the writer

bull Identifying the Influencers and recording their linguistic patterns

bull Identifying emoticons and comparing them to known regional and generational variants

It is critical in monitoring to understand that internal marketing descriptors and paid search terms are not enough to effectively crawl all of the conversations taking place around a brand Nor is it enough to rely on basic tools like Google Alerts Accurate monitoring is done with professional grade tools like Radian6 under the guidance of experienced monitoring teams like those at Campbell-Ewald

The monitor must use the Urban Dictionary to determine any industry or brand slang check Google AdWords for misspellings and current search trends and check Google Insights for regional interest Finally the researcher must either directly contact influential fans of the brand or failing that spend time reading blog posts by influencers and responses to their content from their audience Only then can a keyword set be considered accurate and comprehensive

15

ContactDaveLinabury Group Director Social MediaDaveLinaburyc-ecom

JasonMacemore Digital StrategistJasonMacemorec-ecom

GaryOlson Senior Social Media PlannerGaryOlsonc-ecom

Campbell-Ewald30400 Van Dyke Ave Warren Michigan 48093 +1 (586) 574-3400 httpc-ecom

Page 3: Linguistic challenges associated with monitoring social media

Executive SummarySocial media monitoring tools such as Radian6 Sysomos and Scout have tremendous capabilities for pulling in highly targeted conversations taking place around a topic person or brand from all social media platforms These tools also enable a corporation or researcher to find the influencers around a topic

While the targeting potential is astonishing variations in languages slang regional idioms misspellings and nicknames for topics and brands make accurate targeting difficult Whatrsquos more the influencers around a brand or topic are the most likely to use a nickname slang term or personal parlance known only to their social circle This makes understanding of these language variants critical

Campbell-Ewaldrsquos Social Media team has addressed these linguistic challenges in their own client monitoring projects over the past five years by

bull Determiningcurrentsearchtrends around a topic This identifies not only how users are searching (which indicates intent) but aids in the identification of misspellings and relevant associated topics

bull Determiningtheageandgenderofthewriter through the use of various tools knowledge of generational writing patterns and comparing regional variations against reliable reference sources of slang

bull Identifyingtheinfluencers and recording their linguistic patterns

bull Identifyingemoticonsand comparing them to known regional and generational variants

This paper will detail these challengesmdashwhich are largely unknown to most users of these monitoring toolsmdashin hopes that their own monitoring will be more accurate and complete

3

BackgroundHistoryCampbell-Ewald has been an active participant in social media since early 2006 Their lead social media planners Dave Linabury and Jason Macemore were among the first to develop social media monitoring tools such as Fat Pipe and Sentimentor It was the development of these early tools mdash created to meet their own needs as researchers mdash that led to understanding the challenges raised in this paper

Linabury and Macemore quickly discovered that monitoring tools were not always able to spider all of the conversations that were known to exist At first they theorized that conversations werenrsquot being pulled in because different coding methods and naming conventions for Web site sections made it difficult for the tools to parse data

As new technologies made parsing data easier the initial theory proved to be an incorrect assessment It was ascertained in mid-2007 that linguistic variants were the cause

SuccessesSince the discovery of the linguistic variant sets Campbell-Ewald has become the nationrsquos leader in social media monitoring They have been tasked with monitoring data for several United States government agencies including the United States Navy the United States Mint the United States Naval Academy the FBI the Center for Disease Control (CDC)and the Environmental Protection Agency (EPA)

In addition to government clients and projects Campbell-Ewaldrsquos social media team under the leadership of Linabury also provides monitoring for dozens of Fortune 500 clients while garnering numerous awards such as a Gold Echo Award a Silver Effie Best Military Site of 2009 and Best Social Media Strategy among others

4

Target MarketBased on usage trends the target audience for social media monitoring applications can be divided into two main segments internalandexternal

Categorically corporations and public relations firms tend to use monitoring tools for internally-driven ends These typically include reputation management crisis management and as a clipping service to capture media mentions

Keyword strategies for these approaches are typically limited to formal brand names the CEOrsquos name and associated marketing terminology They rarely take into consideration linguistic variations context or subtle sentiment variations

Inversely advertising agencies researchers and social media agencies tend to retain an external focus to their monitoring efforts concentrating on sentiment analysis brand perception and marketing effectiveness and awareness

External monitoring tends to consider contextual relevance far more than PR firms do but most still lack the incorporation of (or even existence of) language variants that need to be considered for accurate and inclusive brand monitoring in the social space

5

Business ChallengeThe User Base Grows AnnuallyNearly 40 of corporations are turning to social monitoring to keep abreast of whatrsquos being said about their brand Many take this on internally but most hire outside social media companies or agencies However virtually none of them are aware that they are not seeing the entire conversation and blindly put faith in their chosen monitoring tool that it will fulfill their needs and find all of the relevant online discussions about their brand product or services

This is not the case The tools are limited by the thoroughness of the toolrsquos operator and how much time is spent determining appropriate keywords Most administrators make the assumption that the terms they use as marketing descriptors (eg marketing copy search terms and PR copy) are enough

Many monitoring tools are set up for the corporation by the tool manufacturer It is highly unlikely the tool creator could understand a brand as well as the employees agencies or long-standing vendors of the corporation

The reality is the marketing descriptors are generally one-sided somewhat aspirational and rarely match customer expectations and perceptions Few companies use keywords describing themselves as ldquocheaprdquo ldquoaveragerdquo ldquoacceptablerdquo ldquopoorrdquo ldquopatheticrdquo ldquogood enoughrdquo etcrdquo however those are precisely the terms consumers use with respect to brands For proof of this one need only see how brands are described at BrandTagsnet where tens of thousands of consumers have used those exact terms to describe hundreds of corporations in ever-growing tag clouds of user-generated terms

6

Tagcloud about Chrysler from BrandTagsnet

The Problem with Monitoring LanguageLanguages constantly evolve They evolve nationally regionally and hyper-locally For example a popular phrase among teens nationally to describe something amazing is ldquooff the hookrdquo Regional variants such as ldquooff the chainrdquo [Detroit] and ldquooff the heezyrdquo [Brooklyn] exist as well Hyper-locally a neighborhood may have yet another variant shared among friends but not generally known outside that block

This presents unique challenges to the researcher who is using social media monitoring tools If a phrase is known it will be used as a key search term for the tool to use If however more people are using lesser known regional variants the tool loses effectiveness

1337speakThere exist several linguistic phenomenon online that do not exist offline One is the well-known variant known as hacker speak or ldquo1337speakrdquo (Elite speak) This variation goes back more than a decade online It was developed by computer hackers in an effort to make their messages to each other difficult to read by outsiders Words are deconstructed to their visual elements and replaced with alpha-numeric and punctuation equivalents that bear a passing resemblance to the original letter form

For example a capital lsquoTrsquo may be replaced with the number 7 or a + sign The word lsquoatrsquo will be replaced with the sign Capital lsquoErsquo becomes a 3 and so on There is no sequencing to the replacements it is simply a matter of finding letters numbers and punctuation that can be substituted Indeed cleverness is praised and while online ldquo1337speak generatorsrdquo exist which ldquotranslaterdquo text back and forth between English and 1337speak each hacker has her own style of writing and will make personal substitutions that others may or may not choose to adopt

Here is a sample sentence in English first then 1337speak ldquoTime Magazinersquos reporter had no idea what we were afterrdquo ldquo71M3 M464z1n3rsquo5 |23p0|273|2 H4| n0 1|34 wH47 w3 w3|23 4f73|2rdquo

7

If hackers were discussing a new Intel processor in 1337speak no monitoring tool would be able to pick up that conversation as no complete English words exist in hacker speak for the tool to pick up

LOLCATSSites such as General Mayhem 4chanorg and ICANHASCHEEZBURGER are responsible for spreading one of the more popular slang variants known as LOLCATS (pronounced ldquoLAHL catsrdquo) The meme originated as a series of cute pictures of kittens doing things with the accompanying text purported to be the voice of the cat Cats according to the meme have unique spellings of English poor grammar and prefer the ldquoImpactrdquo font Eventually kids began using LOLCATS as an accepted form of writing in text messages instant messages email and even speech

Like 1337speak LOLCATS speak can be difficult if not impossible for monitoring tools to parse as plain English Consider the sentence used for the 1337speak example in English then in LOLCATS

ldquoTime Magazinersquos reporter had no idea what we were afterrdquo ldquoTIEM MAGAZEENZ REPORTR IZ R NO IDEAZ WUT WE R AFTERZrdquo

Intentional MisspellingsFinally Generation Y general do not spell correctly sometimes out of laziness sometimes mdash like hackers mdashto intentionally disguise their messages from authority figures This may not matter to a company monitoring the conversations of senior citizens but if the target audience is the highly sought after 18-24 crowd it is an issue that must be understood Here is a real example found on MySpace from a 16 year-old girl to her friends

ldquoHAY GUISE LOL WUT CHARGIN LAZOR LOLZ SHOOP DA WHOOP THIS KID TOOK MY LUNCH MONEY CALL HIM AND SAY BAD THINGS HERES HIS NUMBER LOLZ 696 696 6969 BUT BECAREFUL HE DOSNT AFRAID OF ANYTHINGrdquo

8

Generational Differences in EmoticonsMicroblogging platforms like Twitter and Foursquare which necessitate short messaging seem almost devoid of emoticons It is our theory that hashtagsmdashshort linked codes preceded by the pound sign ()mdashtake the place of emoticons on microblogging as many hashtags are used sarcastically such as whatever or ilovemylife

There are distinct differences between the types of emoticons created by the different birth generations in the United States Notice that with each generation the ldquofacesrdquo become slightly more realistic

bull The so-called SilentGeneration (1925-1945) are the least likely to use emoticons in speech other than the most basic (smiles and frowns)

bull The BabyBoomers (1946-1963) use emoticons sparingly but nevertheless use more than just -) and -( symbols They will include others such as - (unsure) -O (surprised) and -) (wink) Notice the addition of a nose formed with the hyphen key

bull GenerationXers (1964-1980) use the most emoticons of the older three generations They include unusual emoticons such as gt--(deggt (dead fish) and ^p (sticking out tongue) even emoticons meant sexually such as (o) for breasts Noses are often present usually with a carat ^ in place of a hyphen although hyphens are prevalent as well

bull It is with GenerationY (1981-2000) that we see the greatest change in emoticons where the ldquofacesrdquo move from sideways to forward facing taken from the Japanese kaomoji Compare the symbol for wink between Generations X and Y ^) and (0_-)

9

Silent Generation Wink

)Baby Boomer Wink

-)Generation X Wink

^)Generation Y Wink

(O_-)

Gender Analysis Gender Analysis may be unfamiliar to most and many may question even why it is necessary The reason is simple Comments may arise where either the screen name of the writer is ambiguous or the writing style of a known individual seems to drastically change suddenly In the latter case there is the distinct possibility of profile fraud

Some individuals may pretend to be the opposite gender for various reasons to pretend to be another person for a prank to assume the identity of another for fraudulent reasons to pretend to be the opposite gender for sexual reasons to pretend to be another for undercover work as in vice-squad or detective work

SolutionsRelying solely on internal industry and marketing keywords will not suffice It is crucial to take additional steps

The following sources be used to determine additional relevant keywords

bull TheUrbanDictionary httpurbandictionarycom Continually updated the Urban Dictionary is easily the largest source of regional national and international slang on the Internet Excellent for typing in industry terms to see if variations exist and regionally where they are used

bull GoogleInsightshttpgooglecominsightssearch Google Insights allow searches to go from global down to individual cities with timeframes from the last 30 days as far back as 2004 They provide trends on rising search patterns based on the root key term maps indicating geo-density forecasts and news headlines plotted on trend lines

10

bull GoogleAdWords httpadwordsgooglecom AdWords is a free tool from Google designed to assist companies in making better choices when selecting keywords for paid search buys The tool can also be used to help select better keywords for social media monitoring Keywords are shown by the latest search patterns with search quantities displayed

bull InfluencersAsk active and influential customers for terms nicknames etc If your company does not have a personal relationship with its influencers find and read their blogs and tweets paying close attention to the responses from their audiences Flag unusual words spellings and abbreviations

bull GenderGeniehttp bookblognetgendergeniephp A free tool that can identify the gender of the writer by pasting text into a field and running the algorithm

With these additional keywords misspellings slang nicknames and regional variants the new keyword list will not only yield more data but will finally tell the whole consumer story surrounding the brand

BenefitsIt is no longer an option to be naiumlve enough to actually believe that no one is talking All brands are being discussed by someone Only through the proper configuration of professional-grade monitoring tools like Radian6mdashand preferably under the guidance of a social media agency that specializes in monitoring and analysismdashcan a company expect to truly know what is being said about their brand

Not knowing how your brand is being discussed and described means that brand is not getting the entire picture as is the case with the reports from PR agencies and most internal social media monitoring

By applying these techniques and using these additional tools a brand can be certain of seeing the full picture and glean far more learnings from their customer base

11

Case Study Chevrolet CobaltThe phenomenon of linguistic variants was first noticed and described by Linabury and Macemore in 2007 to General Motors while they were monitoring conversations pertaining to the Chevrolet Cobaltmdasha small car that young males were customizingmdashalong with Honda Accordsmdashinto street rods (known regionally as Rice Rods Rice Burners Rice Rockets etc) The assignment was to find out what these young men were saying about the Cobalt as they were deemed by Chevrolet to be influencers to non-Chevrolet owners

Campbell-Ewaldrsquos monitoring was confined geographically to the Great Lakes states During the course of the monitoring Macemore noticed that some of the Chicago and Ohio conversations in forums were referring to the Cobalt as a ldquoBaltrdquo Linabury noticed that conversations on the West side of Michigan referred to it as a ldquoC-Carrdquo or ldquoC-Baltrdquo C-Car was the internal name of the vehicle used by engineers but in Michigan (where the car is produced) it is possible that engineering names are known externally

Macemore then theorized that these terms were surfac-ing enough that they should be added to the keywords the monitoring tool was using to spider conversations After adding the new terms the number of conversations found by the tool increased by 53 This led to speculation that theinfluentialmembersofasocialcirclemaybemorelikelytohaveinternalnicknamesthanthoseoutsidethatcircle and that these names needed to be identified at the outset of any social media monitoring assignment to en-sure accurate monitoring and the largest possible data set

Result By adding the additional terms that were manually identified the conversational data set increased by more than 50 and the client gained insight and learnings into how their vehicles were referred to by the most influential purchasers of their product

12

Case Study OnStartradeOnStartrade is a multimillion dollar company that produces a telematics system for vehicles As the system is responsible for saving the lives of hundreds of people involved in motor vehicle accidents OnStartradersquos corporate marketing team wanted up to the minute reports on what their subscribers were saying their detractors and the media In 2007 OnStartrade hired Campbell-Ewaldrsquos Social Media Team to monitor conversations and report back with weekly findings and daily with any outstanding conversations or topics

Campbell-Ewaldrsquos Social Media Team quickly discovered there would be a few barriers to accurate monitoring For example people discussing certain television shows were appearing in the feed Sentences like ldquoDid you see what happened onStar Search last nightrdquo or ldquoThere was one episode onStar Trek wherehelliprdquo These false positives were quickly weeded out through exclusionary phrases added to the keyword set

The team also discovered linguistic variants of OnStartrade appearing in the conversations of loyal fans and influencers which included several hackers Some hackers were tweaking OnStartrade at home (similar to the jail-breaking of iPhones) for fun We found that they used numerous variants of OnStartrade including On On Star On_Star NOnStar ONStar OffStar On-Star OnsStar and BlondeStar (in reference to a YouTube parody of OnStartrade)

Result By adding the additional terms that were manually identified the conversational data set increased by more than 109 and the client gained insight and learnings into how OnStar was being referred to by the most influential purchasers of their product and by an unexpected fan base hackers

13

Technical SpecsAssigning new keywords to any social monitoring tool is simple Finding the keywords is the challenge The following demonstration shows how to add new keywords to an existing set using the popular social media monitoring tool Radian6

In this example the new Dell Mini 3 cellphone has been chosen as a topic to monitor Narrowing the feed to cell phones and removing ldquonoiserdquo about Dell laptops makes the results more accurate

By adding the keyword lsquocellphonersquo and the exclusionary keyword lsquolaptoprsquo the feed examples are more targeted

A search on Google Insights for lsquoDell Mini 3rsquo shows us that consumers are also searching for it as a lsquocellular dellrsquo lsquodell androidrsquo lsquodell android phonersquo lsquodell smartphonersquo and lsquodell mini 5rsquo (a different model)

A look at the Urban Dictionary indicates any cellphone may be referred to as a ldquocellierdquo by youth

These additional keywords (except perhaps the Mini 5) should be added to Radian6rsquos keywords as they represent the intent of users That these keywords are listed by Google as ldquoBreakoutsrdquo is significant breakouts represent a recent increase in search volume of more than 5000

Radian6

Radian6

Google Insights

Urban Dictionary

14

SummaryCampbell-Ewald has been an active participant in social media since early 2006 Their lead social media researchers Dave Linabury and Jason Macemore were among the first to develop social media monitoring software tools It was through the development of these early tools that were created to meet their own needs as researchers that led to understanding the linguistic challenges raised in this paper

Campbell-Ewaldrsquos Social Media team addressed these linguistic challenges in their own client monitoring projects over the past five years utilizing the following approaches

bull Determining current search trends around a topic

bull Determining the age and gender of the writer

bull Identifying the Influencers and recording their linguistic patterns

bull Identifying emoticons and comparing them to known regional and generational variants

It is critical in monitoring to understand that internal marketing descriptors and paid search terms are not enough to effectively crawl all of the conversations taking place around a brand Nor is it enough to rely on basic tools like Google Alerts Accurate monitoring is done with professional grade tools like Radian6 under the guidance of experienced monitoring teams like those at Campbell-Ewald

The monitor must use the Urban Dictionary to determine any industry or brand slang check Google AdWords for misspellings and current search trends and check Google Insights for regional interest Finally the researcher must either directly contact influential fans of the brand or failing that spend time reading blog posts by influencers and responses to their content from their audience Only then can a keyword set be considered accurate and comprehensive

15

ContactDaveLinabury Group Director Social MediaDaveLinaburyc-ecom

JasonMacemore Digital StrategistJasonMacemorec-ecom

GaryOlson Senior Social Media PlannerGaryOlsonc-ecom

Campbell-Ewald30400 Van Dyke Ave Warren Michigan 48093 +1 (586) 574-3400 httpc-ecom

Page 4: Linguistic challenges associated with monitoring social media

BackgroundHistoryCampbell-Ewald has been an active participant in social media since early 2006 Their lead social media planners Dave Linabury and Jason Macemore were among the first to develop social media monitoring tools such as Fat Pipe and Sentimentor It was the development of these early tools mdash created to meet their own needs as researchers mdash that led to understanding the challenges raised in this paper

Linabury and Macemore quickly discovered that monitoring tools were not always able to spider all of the conversations that were known to exist At first they theorized that conversations werenrsquot being pulled in because different coding methods and naming conventions for Web site sections made it difficult for the tools to parse data

As new technologies made parsing data easier the initial theory proved to be an incorrect assessment It was ascertained in mid-2007 that linguistic variants were the cause

SuccessesSince the discovery of the linguistic variant sets Campbell-Ewald has become the nationrsquos leader in social media monitoring They have been tasked with monitoring data for several United States government agencies including the United States Navy the United States Mint the United States Naval Academy the FBI the Center for Disease Control (CDC)and the Environmental Protection Agency (EPA)

In addition to government clients and projects Campbell-Ewaldrsquos social media team under the leadership of Linabury also provides monitoring for dozens of Fortune 500 clients while garnering numerous awards such as a Gold Echo Award a Silver Effie Best Military Site of 2009 and Best Social Media Strategy among others

4

Target MarketBased on usage trends the target audience for social media monitoring applications can be divided into two main segments internalandexternal

Categorically corporations and public relations firms tend to use monitoring tools for internally-driven ends These typically include reputation management crisis management and as a clipping service to capture media mentions

Keyword strategies for these approaches are typically limited to formal brand names the CEOrsquos name and associated marketing terminology They rarely take into consideration linguistic variations context or subtle sentiment variations

Inversely advertising agencies researchers and social media agencies tend to retain an external focus to their monitoring efforts concentrating on sentiment analysis brand perception and marketing effectiveness and awareness

External monitoring tends to consider contextual relevance far more than PR firms do but most still lack the incorporation of (or even existence of) language variants that need to be considered for accurate and inclusive brand monitoring in the social space

5

Business ChallengeThe User Base Grows AnnuallyNearly 40 of corporations are turning to social monitoring to keep abreast of whatrsquos being said about their brand Many take this on internally but most hire outside social media companies or agencies However virtually none of them are aware that they are not seeing the entire conversation and blindly put faith in their chosen monitoring tool that it will fulfill their needs and find all of the relevant online discussions about their brand product or services

This is not the case The tools are limited by the thoroughness of the toolrsquos operator and how much time is spent determining appropriate keywords Most administrators make the assumption that the terms they use as marketing descriptors (eg marketing copy search terms and PR copy) are enough

Many monitoring tools are set up for the corporation by the tool manufacturer It is highly unlikely the tool creator could understand a brand as well as the employees agencies or long-standing vendors of the corporation

The reality is the marketing descriptors are generally one-sided somewhat aspirational and rarely match customer expectations and perceptions Few companies use keywords describing themselves as ldquocheaprdquo ldquoaveragerdquo ldquoacceptablerdquo ldquopoorrdquo ldquopatheticrdquo ldquogood enoughrdquo etcrdquo however those are precisely the terms consumers use with respect to brands For proof of this one need only see how brands are described at BrandTagsnet where tens of thousands of consumers have used those exact terms to describe hundreds of corporations in ever-growing tag clouds of user-generated terms

6

Tagcloud about Chrysler from BrandTagsnet

The Problem with Monitoring LanguageLanguages constantly evolve They evolve nationally regionally and hyper-locally For example a popular phrase among teens nationally to describe something amazing is ldquooff the hookrdquo Regional variants such as ldquooff the chainrdquo [Detroit] and ldquooff the heezyrdquo [Brooklyn] exist as well Hyper-locally a neighborhood may have yet another variant shared among friends but not generally known outside that block

This presents unique challenges to the researcher who is using social media monitoring tools If a phrase is known it will be used as a key search term for the tool to use If however more people are using lesser known regional variants the tool loses effectiveness

1337speakThere exist several linguistic phenomenon online that do not exist offline One is the well-known variant known as hacker speak or ldquo1337speakrdquo (Elite speak) This variation goes back more than a decade online It was developed by computer hackers in an effort to make their messages to each other difficult to read by outsiders Words are deconstructed to their visual elements and replaced with alpha-numeric and punctuation equivalents that bear a passing resemblance to the original letter form

For example a capital lsquoTrsquo may be replaced with the number 7 or a + sign The word lsquoatrsquo will be replaced with the sign Capital lsquoErsquo becomes a 3 and so on There is no sequencing to the replacements it is simply a matter of finding letters numbers and punctuation that can be substituted Indeed cleverness is praised and while online ldquo1337speak generatorsrdquo exist which ldquotranslaterdquo text back and forth between English and 1337speak each hacker has her own style of writing and will make personal substitutions that others may or may not choose to adopt

Here is a sample sentence in English first then 1337speak ldquoTime Magazinersquos reporter had no idea what we were afterrdquo ldquo71M3 M464z1n3rsquo5 |23p0|273|2 H4| n0 1|34 wH47 w3 w3|23 4f73|2rdquo

7

If hackers were discussing a new Intel processor in 1337speak no monitoring tool would be able to pick up that conversation as no complete English words exist in hacker speak for the tool to pick up

LOLCATSSites such as General Mayhem 4chanorg and ICANHASCHEEZBURGER are responsible for spreading one of the more popular slang variants known as LOLCATS (pronounced ldquoLAHL catsrdquo) The meme originated as a series of cute pictures of kittens doing things with the accompanying text purported to be the voice of the cat Cats according to the meme have unique spellings of English poor grammar and prefer the ldquoImpactrdquo font Eventually kids began using LOLCATS as an accepted form of writing in text messages instant messages email and even speech

Like 1337speak LOLCATS speak can be difficult if not impossible for monitoring tools to parse as plain English Consider the sentence used for the 1337speak example in English then in LOLCATS

ldquoTime Magazinersquos reporter had no idea what we were afterrdquo ldquoTIEM MAGAZEENZ REPORTR IZ R NO IDEAZ WUT WE R AFTERZrdquo

Intentional MisspellingsFinally Generation Y general do not spell correctly sometimes out of laziness sometimes mdash like hackers mdashto intentionally disguise their messages from authority figures This may not matter to a company monitoring the conversations of senior citizens but if the target audience is the highly sought after 18-24 crowd it is an issue that must be understood Here is a real example found on MySpace from a 16 year-old girl to her friends

ldquoHAY GUISE LOL WUT CHARGIN LAZOR LOLZ SHOOP DA WHOOP THIS KID TOOK MY LUNCH MONEY CALL HIM AND SAY BAD THINGS HERES HIS NUMBER LOLZ 696 696 6969 BUT BECAREFUL HE DOSNT AFRAID OF ANYTHINGrdquo

8

Generational Differences in EmoticonsMicroblogging platforms like Twitter and Foursquare which necessitate short messaging seem almost devoid of emoticons It is our theory that hashtagsmdashshort linked codes preceded by the pound sign ()mdashtake the place of emoticons on microblogging as many hashtags are used sarcastically such as whatever or ilovemylife

There are distinct differences between the types of emoticons created by the different birth generations in the United States Notice that with each generation the ldquofacesrdquo become slightly more realistic

bull The so-called SilentGeneration (1925-1945) are the least likely to use emoticons in speech other than the most basic (smiles and frowns)

bull The BabyBoomers (1946-1963) use emoticons sparingly but nevertheless use more than just -) and -( symbols They will include others such as - (unsure) -O (surprised) and -) (wink) Notice the addition of a nose formed with the hyphen key

bull GenerationXers (1964-1980) use the most emoticons of the older three generations They include unusual emoticons such as gt--(deggt (dead fish) and ^p (sticking out tongue) even emoticons meant sexually such as (o) for breasts Noses are often present usually with a carat ^ in place of a hyphen although hyphens are prevalent as well

bull It is with GenerationY (1981-2000) that we see the greatest change in emoticons where the ldquofacesrdquo move from sideways to forward facing taken from the Japanese kaomoji Compare the symbol for wink between Generations X and Y ^) and (0_-)

9

Silent Generation Wink

)Baby Boomer Wink

-)Generation X Wink

^)Generation Y Wink

(O_-)

Gender Analysis Gender Analysis may be unfamiliar to most and many may question even why it is necessary The reason is simple Comments may arise where either the screen name of the writer is ambiguous or the writing style of a known individual seems to drastically change suddenly In the latter case there is the distinct possibility of profile fraud

Some individuals may pretend to be the opposite gender for various reasons to pretend to be another person for a prank to assume the identity of another for fraudulent reasons to pretend to be the opposite gender for sexual reasons to pretend to be another for undercover work as in vice-squad or detective work

SolutionsRelying solely on internal industry and marketing keywords will not suffice It is crucial to take additional steps

The following sources be used to determine additional relevant keywords

bull TheUrbanDictionary httpurbandictionarycom Continually updated the Urban Dictionary is easily the largest source of regional national and international slang on the Internet Excellent for typing in industry terms to see if variations exist and regionally where they are used

bull GoogleInsightshttpgooglecominsightssearch Google Insights allow searches to go from global down to individual cities with timeframes from the last 30 days as far back as 2004 They provide trends on rising search patterns based on the root key term maps indicating geo-density forecasts and news headlines plotted on trend lines

10

bull GoogleAdWords httpadwordsgooglecom AdWords is a free tool from Google designed to assist companies in making better choices when selecting keywords for paid search buys The tool can also be used to help select better keywords for social media monitoring Keywords are shown by the latest search patterns with search quantities displayed

bull InfluencersAsk active and influential customers for terms nicknames etc If your company does not have a personal relationship with its influencers find and read their blogs and tweets paying close attention to the responses from their audiences Flag unusual words spellings and abbreviations

bull GenderGeniehttp bookblognetgendergeniephp A free tool that can identify the gender of the writer by pasting text into a field and running the algorithm

With these additional keywords misspellings slang nicknames and regional variants the new keyword list will not only yield more data but will finally tell the whole consumer story surrounding the brand

BenefitsIt is no longer an option to be naiumlve enough to actually believe that no one is talking All brands are being discussed by someone Only through the proper configuration of professional-grade monitoring tools like Radian6mdashand preferably under the guidance of a social media agency that specializes in monitoring and analysismdashcan a company expect to truly know what is being said about their brand

Not knowing how your brand is being discussed and described means that brand is not getting the entire picture as is the case with the reports from PR agencies and most internal social media monitoring

By applying these techniques and using these additional tools a brand can be certain of seeing the full picture and glean far more learnings from their customer base

11

Case Study Chevrolet CobaltThe phenomenon of linguistic variants was first noticed and described by Linabury and Macemore in 2007 to General Motors while they were monitoring conversations pertaining to the Chevrolet Cobaltmdasha small car that young males were customizingmdashalong with Honda Accordsmdashinto street rods (known regionally as Rice Rods Rice Burners Rice Rockets etc) The assignment was to find out what these young men were saying about the Cobalt as they were deemed by Chevrolet to be influencers to non-Chevrolet owners

Campbell-Ewaldrsquos monitoring was confined geographically to the Great Lakes states During the course of the monitoring Macemore noticed that some of the Chicago and Ohio conversations in forums were referring to the Cobalt as a ldquoBaltrdquo Linabury noticed that conversations on the West side of Michigan referred to it as a ldquoC-Carrdquo or ldquoC-Baltrdquo C-Car was the internal name of the vehicle used by engineers but in Michigan (where the car is produced) it is possible that engineering names are known externally

Macemore then theorized that these terms were surfac-ing enough that they should be added to the keywords the monitoring tool was using to spider conversations After adding the new terms the number of conversations found by the tool increased by 53 This led to speculation that theinfluentialmembersofasocialcirclemaybemorelikelytohaveinternalnicknamesthanthoseoutsidethatcircle and that these names needed to be identified at the outset of any social media monitoring assignment to en-sure accurate monitoring and the largest possible data set

Result By adding the additional terms that were manually identified the conversational data set increased by more than 50 and the client gained insight and learnings into how their vehicles were referred to by the most influential purchasers of their product

12

Case Study OnStartradeOnStartrade is a multimillion dollar company that produces a telematics system for vehicles As the system is responsible for saving the lives of hundreds of people involved in motor vehicle accidents OnStartradersquos corporate marketing team wanted up to the minute reports on what their subscribers were saying their detractors and the media In 2007 OnStartrade hired Campbell-Ewaldrsquos Social Media Team to monitor conversations and report back with weekly findings and daily with any outstanding conversations or topics

Campbell-Ewaldrsquos Social Media Team quickly discovered there would be a few barriers to accurate monitoring For example people discussing certain television shows were appearing in the feed Sentences like ldquoDid you see what happened onStar Search last nightrdquo or ldquoThere was one episode onStar Trek wherehelliprdquo These false positives were quickly weeded out through exclusionary phrases added to the keyword set

The team also discovered linguistic variants of OnStartrade appearing in the conversations of loyal fans and influencers which included several hackers Some hackers were tweaking OnStartrade at home (similar to the jail-breaking of iPhones) for fun We found that they used numerous variants of OnStartrade including On On Star On_Star NOnStar ONStar OffStar On-Star OnsStar and BlondeStar (in reference to a YouTube parody of OnStartrade)

Result By adding the additional terms that were manually identified the conversational data set increased by more than 109 and the client gained insight and learnings into how OnStar was being referred to by the most influential purchasers of their product and by an unexpected fan base hackers

13

Technical SpecsAssigning new keywords to any social monitoring tool is simple Finding the keywords is the challenge The following demonstration shows how to add new keywords to an existing set using the popular social media monitoring tool Radian6

In this example the new Dell Mini 3 cellphone has been chosen as a topic to monitor Narrowing the feed to cell phones and removing ldquonoiserdquo about Dell laptops makes the results more accurate

By adding the keyword lsquocellphonersquo and the exclusionary keyword lsquolaptoprsquo the feed examples are more targeted

A search on Google Insights for lsquoDell Mini 3rsquo shows us that consumers are also searching for it as a lsquocellular dellrsquo lsquodell androidrsquo lsquodell android phonersquo lsquodell smartphonersquo and lsquodell mini 5rsquo (a different model)

A look at the Urban Dictionary indicates any cellphone may be referred to as a ldquocellierdquo by youth

These additional keywords (except perhaps the Mini 5) should be added to Radian6rsquos keywords as they represent the intent of users That these keywords are listed by Google as ldquoBreakoutsrdquo is significant breakouts represent a recent increase in search volume of more than 5000

Radian6

Radian6

Google Insights

Urban Dictionary

14

SummaryCampbell-Ewald has been an active participant in social media since early 2006 Their lead social media researchers Dave Linabury and Jason Macemore were among the first to develop social media monitoring software tools It was through the development of these early tools that were created to meet their own needs as researchers that led to understanding the linguistic challenges raised in this paper

Campbell-Ewaldrsquos Social Media team addressed these linguistic challenges in their own client monitoring projects over the past five years utilizing the following approaches

bull Determining current search trends around a topic

bull Determining the age and gender of the writer

bull Identifying the Influencers and recording their linguistic patterns

bull Identifying emoticons and comparing them to known regional and generational variants

It is critical in monitoring to understand that internal marketing descriptors and paid search terms are not enough to effectively crawl all of the conversations taking place around a brand Nor is it enough to rely on basic tools like Google Alerts Accurate monitoring is done with professional grade tools like Radian6 under the guidance of experienced monitoring teams like those at Campbell-Ewald

The monitor must use the Urban Dictionary to determine any industry or brand slang check Google AdWords for misspellings and current search trends and check Google Insights for regional interest Finally the researcher must either directly contact influential fans of the brand or failing that spend time reading blog posts by influencers and responses to their content from their audience Only then can a keyword set be considered accurate and comprehensive

15

ContactDaveLinabury Group Director Social MediaDaveLinaburyc-ecom

JasonMacemore Digital StrategistJasonMacemorec-ecom

GaryOlson Senior Social Media PlannerGaryOlsonc-ecom

Campbell-Ewald30400 Van Dyke Ave Warren Michigan 48093 +1 (586) 574-3400 httpc-ecom

Page 5: Linguistic challenges associated with monitoring social media

Target MarketBased on usage trends the target audience for social media monitoring applications can be divided into two main segments internalandexternal

Categorically corporations and public relations firms tend to use monitoring tools for internally-driven ends These typically include reputation management crisis management and as a clipping service to capture media mentions

Keyword strategies for these approaches are typically limited to formal brand names the CEOrsquos name and associated marketing terminology They rarely take into consideration linguistic variations context or subtle sentiment variations

Inversely advertising agencies researchers and social media agencies tend to retain an external focus to their monitoring efforts concentrating on sentiment analysis brand perception and marketing effectiveness and awareness

External monitoring tends to consider contextual relevance far more than PR firms do but most still lack the incorporation of (or even existence of) language variants that need to be considered for accurate and inclusive brand monitoring in the social space

5

Business ChallengeThe User Base Grows AnnuallyNearly 40 of corporations are turning to social monitoring to keep abreast of whatrsquos being said about their brand Many take this on internally but most hire outside social media companies or agencies However virtually none of them are aware that they are not seeing the entire conversation and blindly put faith in their chosen monitoring tool that it will fulfill their needs and find all of the relevant online discussions about their brand product or services

This is not the case The tools are limited by the thoroughness of the toolrsquos operator and how much time is spent determining appropriate keywords Most administrators make the assumption that the terms they use as marketing descriptors (eg marketing copy search terms and PR copy) are enough

Many monitoring tools are set up for the corporation by the tool manufacturer It is highly unlikely the tool creator could understand a brand as well as the employees agencies or long-standing vendors of the corporation

The reality is the marketing descriptors are generally one-sided somewhat aspirational and rarely match customer expectations and perceptions Few companies use keywords describing themselves as ldquocheaprdquo ldquoaveragerdquo ldquoacceptablerdquo ldquopoorrdquo ldquopatheticrdquo ldquogood enoughrdquo etcrdquo however those are precisely the terms consumers use with respect to brands For proof of this one need only see how brands are described at BrandTagsnet where tens of thousands of consumers have used those exact terms to describe hundreds of corporations in ever-growing tag clouds of user-generated terms

6

Tagcloud about Chrysler from BrandTagsnet

The Problem with Monitoring LanguageLanguages constantly evolve They evolve nationally regionally and hyper-locally For example a popular phrase among teens nationally to describe something amazing is ldquooff the hookrdquo Regional variants such as ldquooff the chainrdquo [Detroit] and ldquooff the heezyrdquo [Brooklyn] exist as well Hyper-locally a neighborhood may have yet another variant shared among friends but not generally known outside that block

This presents unique challenges to the researcher who is using social media monitoring tools If a phrase is known it will be used as a key search term for the tool to use If however more people are using lesser known regional variants the tool loses effectiveness

1337speakThere exist several linguistic phenomenon online that do not exist offline One is the well-known variant known as hacker speak or ldquo1337speakrdquo (Elite speak) This variation goes back more than a decade online It was developed by computer hackers in an effort to make their messages to each other difficult to read by outsiders Words are deconstructed to their visual elements and replaced with alpha-numeric and punctuation equivalents that bear a passing resemblance to the original letter form

For example a capital lsquoTrsquo may be replaced with the number 7 or a + sign The word lsquoatrsquo will be replaced with the sign Capital lsquoErsquo becomes a 3 and so on There is no sequencing to the replacements it is simply a matter of finding letters numbers and punctuation that can be substituted Indeed cleverness is praised and while online ldquo1337speak generatorsrdquo exist which ldquotranslaterdquo text back and forth between English and 1337speak each hacker has her own style of writing and will make personal substitutions that others may or may not choose to adopt

Here is a sample sentence in English first then 1337speak ldquoTime Magazinersquos reporter had no idea what we were afterrdquo ldquo71M3 M464z1n3rsquo5 |23p0|273|2 H4| n0 1|34 wH47 w3 w3|23 4f73|2rdquo

7

If hackers were discussing a new Intel processor in 1337speak no monitoring tool would be able to pick up that conversation as no complete English words exist in hacker speak for the tool to pick up

LOLCATSSites such as General Mayhem 4chanorg and ICANHASCHEEZBURGER are responsible for spreading one of the more popular slang variants known as LOLCATS (pronounced ldquoLAHL catsrdquo) The meme originated as a series of cute pictures of kittens doing things with the accompanying text purported to be the voice of the cat Cats according to the meme have unique spellings of English poor grammar and prefer the ldquoImpactrdquo font Eventually kids began using LOLCATS as an accepted form of writing in text messages instant messages email and even speech

Like 1337speak LOLCATS speak can be difficult if not impossible for monitoring tools to parse as plain English Consider the sentence used for the 1337speak example in English then in LOLCATS

ldquoTime Magazinersquos reporter had no idea what we were afterrdquo ldquoTIEM MAGAZEENZ REPORTR IZ R NO IDEAZ WUT WE R AFTERZrdquo

Intentional MisspellingsFinally Generation Y general do not spell correctly sometimes out of laziness sometimes mdash like hackers mdashto intentionally disguise their messages from authority figures This may not matter to a company monitoring the conversations of senior citizens but if the target audience is the highly sought after 18-24 crowd it is an issue that must be understood Here is a real example found on MySpace from a 16 year-old girl to her friends

ldquoHAY GUISE LOL WUT CHARGIN LAZOR LOLZ SHOOP DA WHOOP THIS KID TOOK MY LUNCH MONEY CALL HIM AND SAY BAD THINGS HERES HIS NUMBER LOLZ 696 696 6969 BUT BECAREFUL HE DOSNT AFRAID OF ANYTHINGrdquo

8

Generational Differences in EmoticonsMicroblogging platforms like Twitter and Foursquare which necessitate short messaging seem almost devoid of emoticons It is our theory that hashtagsmdashshort linked codes preceded by the pound sign ()mdashtake the place of emoticons on microblogging as many hashtags are used sarcastically such as whatever or ilovemylife

There are distinct differences between the types of emoticons created by the different birth generations in the United States Notice that with each generation the ldquofacesrdquo become slightly more realistic

bull The so-called SilentGeneration (1925-1945) are the least likely to use emoticons in speech other than the most basic (smiles and frowns)

bull The BabyBoomers (1946-1963) use emoticons sparingly but nevertheless use more than just -) and -( symbols They will include others such as - (unsure) -O (surprised) and -) (wink) Notice the addition of a nose formed with the hyphen key

bull GenerationXers (1964-1980) use the most emoticons of the older three generations They include unusual emoticons such as gt--(deggt (dead fish) and ^p (sticking out tongue) even emoticons meant sexually such as (o) for breasts Noses are often present usually with a carat ^ in place of a hyphen although hyphens are prevalent as well

bull It is with GenerationY (1981-2000) that we see the greatest change in emoticons where the ldquofacesrdquo move from sideways to forward facing taken from the Japanese kaomoji Compare the symbol for wink between Generations X and Y ^) and (0_-)

9

Silent Generation Wink

)Baby Boomer Wink

-)Generation X Wink

^)Generation Y Wink

(O_-)

Gender Analysis Gender Analysis may be unfamiliar to most and many may question even why it is necessary The reason is simple Comments may arise where either the screen name of the writer is ambiguous or the writing style of a known individual seems to drastically change suddenly In the latter case there is the distinct possibility of profile fraud

Some individuals may pretend to be the opposite gender for various reasons to pretend to be another person for a prank to assume the identity of another for fraudulent reasons to pretend to be the opposite gender for sexual reasons to pretend to be another for undercover work as in vice-squad or detective work

SolutionsRelying solely on internal industry and marketing keywords will not suffice It is crucial to take additional steps

The following sources be used to determine additional relevant keywords

bull TheUrbanDictionary httpurbandictionarycom Continually updated the Urban Dictionary is easily the largest source of regional national and international slang on the Internet Excellent for typing in industry terms to see if variations exist and regionally where they are used

bull GoogleInsightshttpgooglecominsightssearch Google Insights allow searches to go from global down to individual cities with timeframes from the last 30 days as far back as 2004 They provide trends on rising search patterns based on the root key term maps indicating geo-density forecasts and news headlines plotted on trend lines

10

bull GoogleAdWords httpadwordsgooglecom AdWords is a free tool from Google designed to assist companies in making better choices when selecting keywords for paid search buys The tool can also be used to help select better keywords for social media monitoring Keywords are shown by the latest search patterns with search quantities displayed

bull InfluencersAsk active and influential customers for terms nicknames etc If your company does not have a personal relationship with its influencers find and read their blogs and tweets paying close attention to the responses from their audiences Flag unusual words spellings and abbreviations

bull GenderGeniehttp bookblognetgendergeniephp A free tool that can identify the gender of the writer by pasting text into a field and running the algorithm

With these additional keywords misspellings slang nicknames and regional variants the new keyword list will not only yield more data but will finally tell the whole consumer story surrounding the brand

BenefitsIt is no longer an option to be naiumlve enough to actually believe that no one is talking All brands are being discussed by someone Only through the proper configuration of professional-grade monitoring tools like Radian6mdashand preferably under the guidance of a social media agency that specializes in monitoring and analysismdashcan a company expect to truly know what is being said about their brand

Not knowing how your brand is being discussed and described means that brand is not getting the entire picture as is the case with the reports from PR agencies and most internal social media monitoring

By applying these techniques and using these additional tools a brand can be certain of seeing the full picture and glean far more learnings from their customer base

11

Case Study Chevrolet CobaltThe phenomenon of linguistic variants was first noticed and described by Linabury and Macemore in 2007 to General Motors while they were monitoring conversations pertaining to the Chevrolet Cobaltmdasha small car that young males were customizingmdashalong with Honda Accordsmdashinto street rods (known regionally as Rice Rods Rice Burners Rice Rockets etc) The assignment was to find out what these young men were saying about the Cobalt as they were deemed by Chevrolet to be influencers to non-Chevrolet owners

Campbell-Ewaldrsquos monitoring was confined geographically to the Great Lakes states During the course of the monitoring Macemore noticed that some of the Chicago and Ohio conversations in forums were referring to the Cobalt as a ldquoBaltrdquo Linabury noticed that conversations on the West side of Michigan referred to it as a ldquoC-Carrdquo or ldquoC-Baltrdquo C-Car was the internal name of the vehicle used by engineers but in Michigan (where the car is produced) it is possible that engineering names are known externally

Macemore then theorized that these terms were surfac-ing enough that they should be added to the keywords the monitoring tool was using to spider conversations After adding the new terms the number of conversations found by the tool increased by 53 This led to speculation that theinfluentialmembersofasocialcirclemaybemorelikelytohaveinternalnicknamesthanthoseoutsidethatcircle and that these names needed to be identified at the outset of any social media monitoring assignment to en-sure accurate monitoring and the largest possible data set

Result By adding the additional terms that were manually identified the conversational data set increased by more than 50 and the client gained insight and learnings into how their vehicles were referred to by the most influential purchasers of their product

12

Case Study OnStartradeOnStartrade is a multimillion dollar company that produces a telematics system for vehicles As the system is responsible for saving the lives of hundreds of people involved in motor vehicle accidents OnStartradersquos corporate marketing team wanted up to the minute reports on what their subscribers were saying their detractors and the media In 2007 OnStartrade hired Campbell-Ewaldrsquos Social Media Team to monitor conversations and report back with weekly findings and daily with any outstanding conversations or topics

Campbell-Ewaldrsquos Social Media Team quickly discovered there would be a few barriers to accurate monitoring For example people discussing certain television shows were appearing in the feed Sentences like ldquoDid you see what happened onStar Search last nightrdquo or ldquoThere was one episode onStar Trek wherehelliprdquo These false positives were quickly weeded out through exclusionary phrases added to the keyword set

The team also discovered linguistic variants of OnStartrade appearing in the conversations of loyal fans and influencers which included several hackers Some hackers were tweaking OnStartrade at home (similar to the jail-breaking of iPhones) for fun We found that they used numerous variants of OnStartrade including On On Star On_Star NOnStar ONStar OffStar On-Star OnsStar and BlondeStar (in reference to a YouTube parody of OnStartrade)

Result By adding the additional terms that were manually identified the conversational data set increased by more than 109 and the client gained insight and learnings into how OnStar was being referred to by the most influential purchasers of their product and by an unexpected fan base hackers

13

Technical SpecsAssigning new keywords to any social monitoring tool is simple Finding the keywords is the challenge The following demonstration shows how to add new keywords to an existing set using the popular social media monitoring tool Radian6

In this example the new Dell Mini 3 cellphone has been chosen as a topic to monitor Narrowing the feed to cell phones and removing ldquonoiserdquo about Dell laptops makes the results more accurate

By adding the keyword lsquocellphonersquo and the exclusionary keyword lsquolaptoprsquo the feed examples are more targeted

A search on Google Insights for lsquoDell Mini 3rsquo shows us that consumers are also searching for it as a lsquocellular dellrsquo lsquodell androidrsquo lsquodell android phonersquo lsquodell smartphonersquo and lsquodell mini 5rsquo (a different model)

A look at the Urban Dictionary indicates any cellphone may be referred to as a ldquocellierdquo by youth

These additional keywords (except perhaps the Mini 5) should be added to Radian6rsquos keywords as they represent the intent of users That these keywords are listed by Google as ldquoBreakoutsrdquo is significant breakouts represent a recent increase in search volume of more than 5000

Radian6

Radian6

Google Insights

Urban Dictionary

14

SummaryCampbell-Ewald has been an active participant in social media since early 2006 Their lead social media researchers Dave Linabury and Jason Macemore were among the first to develop social media monitoring software tools It was through the development of these early tools that were created to meet their own needs as researchers that led to understanding the linguistic challenges raised in this paper

Campbell-Ewaldrsquos Social Media team addressed these linguistic challenges in their own client monitoring projects over the past five years utilizing the following approaches

bull Determining current search trends around a topic

bull Determining the age and gender of the writer

bull Identifying the Influencers and recording their linguistic patterns

bull Identifying emoticons and comparing them to known regional and generational variants

It is critical in monitoring to understand that internal marketing descriptors and paid search terms are not enough to effectively crawl all of the conversations taking place around a brand Nor is it enough to rely on basic tools like Google Alerts Accurate monitoring is done with professional grade tools like Radian6 under the guidance of experienced monitoring teams like those at Campbell-Ewald

The monitor must use the Urban Dictionary to determine any industry or brand slang check Google AdWords for misspellings and current search trends and check Google Insights for regional interest Finally the researcher must either directly contact influential fans of the brand or failing that spend time reading blog posts by influencers and responses to their content from their audience Only then can a keyword set be considered accurate and comprehensive

15

ContactDaveLinabury Group Director Social MediaDaveLinaburyc-ecom

JasonMacemore Digital StrategistJasonMacemorec-ecom

GaryOlson Senior Social Media PlannerGaryOlsonc-ecom

Campbell-Ewald30400 Van Dyke Ave Warren Michigan 48093 +1 (586) 574-3400 httpc-ecom

Page 6: Linguistic challenges associated with monitoring social media

Business ChallengeThe User Base Grows AnnuallyNearly 40 of corporations are turning to social monitoring to keep abreast of whatrsquos being said about their brand Many take this on internally but most hire outside social media companies or agencies However virtually none of them are aware that they are not seeing the entire conversation and blindly put faith in their chosen monitoring tool that it will fulfill their needs and find all of the relevant online discussions about their brand product or services

This is not the case The tools are limited by the thoroughness of the toolrsquos operator and how much time is spent determining appropriate keywords Most administrators make the assumption that the terms they use as marketing descriptors (eg marketing copy search terms and PR copy) are enough

Many monitoring tools are set up for the corporation by the tool manufacturer It is highly unlikely the tool creator could understand a brand as well as the employees agencies or long-standing vendors of the corporation

The reality is the marketing descriptors are generally one-sided somewhat aspirational and rarely match customer expectations and perceptions Few companies use keywords describing themselves as ldquocheaprdquo ldquoaveragerdquo ldquoacceptablerdquo ldquopoorrdquo ldquopatheticrdquo ldquogood enoughrdquo etcrdquo however those are precisely the terms consumers use with respect to brands For proof of this one need only see how brands are described at BrandTagsnet where tens of thousands of consumers have used those exact terms to describe hundreds of corporations in ever-growing tag clouds of user-generated terms

6

Tagcloud about Chrysler from BrandTagsnet

The Problem with Monitoring LanguageLanguages constantly evolve They evolve nationally regionally and hyper-locally For example a popular phrase among teens nationally to describe something amazing is ldquooff the hookrdquo Regional variants such as ldquooff the chainrdquo [Detroit] and ldquooff the heezyrdquo [Brooklyn] exist as well Hyper-locally a neighborhood may have yet another variant shared among friends but not generally known outside that block

This presents unique challenges to the researcher who is using social media monitoring tools If a phrase is known it will be used as a key search term for the tool to use If however more people are using lesser known regional variants the tool loses effectiveness

1337speakThere exist several linguistic phenomenon online that do not exist offline One is the well-known variant known as hacker speak or ldquo1337speakrdquo (Elite speak) This variation goes back more than a decade online It was developed by computer hackers in an effort to make their messages to each other difficult to read by outsiders Words are deconstructed to their visual elements and replaced with alpha-numeric and punctuation equivalents that bear a passing resemblance to the original letter form

For example a capital lsquoTrsquo may be replaced with the number 7 or a + sign The word lsquoatrsquo will be replaced with the sign Capital lsquoErsquo becomes a 3 and so on There is no sequencing to the replacements it is simply a matter of finding letters numbers and punctuation that can be substituted Indeed cleverness is praised and while online ldquo1337speak generatorsrdquo exist which ldquotranslaterdquo text back and forth between English and 1337speak each hacker has her own style of writing and will make personal substitutions that others may or may not choose to adopt

Here is a sample sentence in English first then 1337speak ldquoTime Magazinersquos reporter had no idea what we were afterrdquo ldquo71M3 M464z1n3rsquo5 |23p0|273|2 H4| n0 1|34 wH47 w3 w3|23 4f73|2rdquo

7

If hackers were discussing a new Intel processor in 1337speak no monitoring tool would be able to pick up that conversation as no complete English words exist in hacker speak for the tool to pick up

LOLCATSSites such as General Mayhem 4chanorg and ICANHASCHEEZBURGER are responsible for spreading one of the more popular slang variants known as LOLCATS (pronounced ldquoLAHL catsrdquo) The meme originated as a series of cute pictures of kittens doing things with the accompanying text purported to be the voice of the cat Cats according to the meme have unique spellings of English poor grammar and prefer the ldquoImpactrdquo font Eventually kids began using LOLCATS as an accepted form of writing in text messages instant messages email and even speech

Like 1337speak LOLCATS speak can be difficult if not impossible for monitoring tools to parse as plain English Consider the sentence used for the 1337speak example in English then in LOLCATS

ldquoTime Magazinersquos reporter had no idea what we were afterrdquo ldquoTIEM MAGAZEENZ REPORTR IZ R NO IDEAZ WUT WE R AFTERZrdquo

Intentional MisspellingsFinally Generation Y general do not spell correctly sometimes out of laziness sometimes mdash like hackers mdashto intentionally disguise their messages from authority figures This may not matter to a company monitoring the conversations of senior citizens but if the target audience is the highly sought after 18-24 crowd it is an issue that must be understood Here is a real example found on MySpace from a 16 year-old girl to her friends

ldquoHAY GUISE LOL WUT CHARGIN LAZOR LOLZ SHOOP DA WHOOP THIS KID TOOK MY LUNCH MONEY CALL HIM AND SAY BAD THINGS HERES HIS NUMBER LOLZ 696 696 6969 BUT BECAREFUL HE DOSNT AFRAID OF ANYTHINGrdquo

8

Generational Differences in EmoticonsMicroblogging platforms like Twitter and Foursquare which necessitate short messaging seem almost devoid of emoticons It is our theory that hashtagsmdashshort linked codes preceded by the pound sign ()mdashtake the place of emoticons on microblogging as many hashtags are used sarcastically such as whatever or ilovemylife

There are distinct differences between the types of emoticons created by the different birth generations in the United States Notice that with each generation the ldquofacesrdquo become slightly more realistic

bull The so-called SilentGeneration (1925-1945) are the least likely to use emoticons in speech other than the most basic (smiles and frowns)

bull The BabyBoomers (1946-1963) use emoticons sparingly but nevertheless use more than just -) and -( symbols They will include others such as - (unsure) -O (surprised) and -) (wink) Notice the addition of a nose formed with the hyphen key

bull GenerationXers (1964-1980) use the most emoticons of the older three generations They include unusual emoticons such as gt--(deggt (dead fish) and ^p (sticking out tongue) even emoticons meant sexually such as (o) for breasts Noses are often present usually with a carat ^ in place of a hyphen although hyphens are prevalent as well

bull It is with GenerationY (1981-2000) that we see the greatest change in emoticons where the ldquofacesrdquo move from sideways to forward facing taken from the Japanese kaomoji Compare the symbol for wink between Generations X and Y ^) and (0_-)

9

Silent Generation Wink

)Baby Boomer Wink

-)Generation X Wink

^)Generation Y Wink

(O_-)

Gender Analysis Gender Analysis may be unfamiliar to most and many may question even why it is necessary The reason is simple Comments may arise where either the screen name of the writer is ambiguous or the writing style of a known individual seems to drastically change suddenly In the latter case there is the distinct possibility of profile fraud

Some individuals may pretend to be the opposite gender for various reasons to pretend to be another person for a prank to assume the identity of another for fraudulent reasons to pretend to be the opposite gender for sexual reasons to pretend to be another for undercover work as in vice-squad or detective work

SolutionsRelying solely on internal industry and marketing keywords will not suffice It is crucial to take additional steps

The following sources be used to determine additional relevant keywords

bull TheUrbanDictionary httpurbandictionarycom Continually updated the Urban Dictionary is easily the largest source of regional national and international slang on the Internet Excellent for typing in industry terms to see if variations exist and regionally where they are used

bull GoogleInsightshttpgooglecominsightssearch Google Insights allow searches to go from global down to individual cities with timeframes from the last 30 days as far back as 2004 They provide trends on rising search patterns based on the root key term maps indicating geo-density forecasts and news headlines plotted on trend lines

10

bull GoogleAdWords httpadwordsgooglecom AdWords is a free tool from Google designed to assist companies in making better choices when selecting keywords for paid search buys The tool can also be used to help select better keywords for social media monitoring Keywords are shown by the latest search patterns with search quantities displayed

bull InfluencersAsk active and influential customers for terms nicknames etc If your company does not have a personal relationship with its influencers find and read their blogs and tweets paying close attention to the responses from their audiences Flag unusual words spellings and abbreviations

bull GenderGeniehttp bookblognetgendergeniephp A free tool that can identify the gender of the writer by pasting text into a field and running the algorithm

With these additional keywords misspellings slang nicknames and regional variants the new keyword list will not only yield more data but will finally tell the whole consumer story surrounding the brand

BenefitsIt is no longer an option to be naiumlve enough to actually believe that no one is talking All brands are being discussed by someone Only through the proper configuration of professional-grade monitoring tools like Radian6mdashand preferably under the guidance of a social media agency that specializes in monitoring and analysismdashcan a company expect to truly know what is being said about their brand

Not knowing how your brand is being discussed and described means that brand is not getting the entire picture as is the case with the reports from PR agencies and most internal social media monitoring

By applying these techniques and using these additional tools a brand can be certain of seeing the full picture and glean far more learnings from their customer base

11

Case Study Chevrolet CobaltThe phenomenon of linguistic variants was first noticed and described by Linabury and Macemore in 2007 to General Motors while they were monitoring conversations pertaining to the Chevrolet Cobaltmdasha small car that young males were customizingmdashalong with Honda Accordsmdashinto street rods (known regionally as Rice Rods Rice Burners Rice Rockets etc) The assignment was to find out what these young men were saying about the Cobalt as they were deemed by Chevrolet to be influencers to non-Chevrolet owners

Campbell-Ewaldrsquos monitoring was confined geographically to the Great Lakes states During the course of the monitoring Macemore noticed that some of the Chicago and Ohio conversations in forums were referring to the Cobalt as a ldquoBaltrdquo Linabury noticed that conversations on the West side of Michigan referred to it as a ldquoC-Carrdquo or ldquoC-Baltrdquo C-Car was the internal name of the vehicle used by engineers but in Michigan (where the car is produced) it is possible that engineering names are known externally

Macemore then theorized that these terms were surfac-ing enough that they should be added to the keywords the monitoring tool was using to spider conversations After adding the new terms the number of conversations found by the tool increased by 53 This led to speculation that theinfluentialmembersofasocialcirclemaybemorelikelytohaveinternalnicknamesthanthoseoutsidethatcircle and that these names needed to be identified at the outset of any social media monitoring assignment to en-sure accurate monitoring and the largest possible data set

Result By adding the additional terms that were manually identified the conversational data set increased by more than 50 and the client gained insight and learnings into how their vehicles were referred to by the most influential purchasers of their product

12

Case Study OnStartradeOnStartrade is a multimillion dollar company that produces a telematics system for vehicles As the system is responsible for saving the lives of hundreds of people involved in motor vehicle accidents OnStartradersquos corporate marketing team wanted up to the minute reports on what their subscribers were saying their detractors and the media In 2007 OnStartrade hired Campbell-Ewaldrsquos Social Media Team to monitor conversations and report back with weekly findings and daily with any outstanding conversations or topics

Campbell-Ewaldrsquos Social Media Team quickly discovered there would be a few barriers to accurate monitoring For example people discussing certain television shows were appearing in the feed Sentences like ldquoDid you see what happened onStar Search last nightrdquo or ldquoThere was one episode onStar Trek wherehelliprdquo These false positives were quickly weeded out through exclusionary phrases added to the keyword set

The team also discovered linguistic variants of OnStartrade appearing in the conversations of loyal fans and influencers which included several hackers Some hackers were tweaking OnStartrade at home (similar to the jail-breaking of iPhones) for fun We found that they used numerous variants of OnStartrade including On On Star On_Star NOnStar ONStar OffStar On-Star OnsStar and BlondeStar (in reference to a YouTube parody of OnStartrade)

Result By adding the additional terms that were manually identified the conversational data set increased by more than 109 and the client gained insight and learnings into how OnStar was being referred to by the most influential purchasers of their product and by an unexpected fan base hackers

13

Technical SpecsAssigning new keywords to any social monitoring tool is simple Finding the keywords is the challenge The following demonstration shows how to add new keywords to an existing set using the popular social media monitoring tool Radian6

In this example the new Dell Mini 3 cellphone has been chosen as a topic to monitor Narrowing the feed to cell phones and removing ldquonoiserdquo about Dell laptops makes the results more accurate

By adding the keyword lsquocellphonersquo and the exclusionary keyword lsquolaptoprsquo the feed examples are more targeted

A search on Google Insights for lsquoDell Mini 3rsquo shows us that consumers are also searching for it as a lsquocellular dellrsquo lsquodell androidrsquo lsquodell android phonersquo lsquodell smartphonersquo and lsquodell mini 5rsquo (a different model)

A look at the Urban Dictionary indicates any cellphone may be referred to as a ldquocellierdquo by youth

These additional keywords (except perhaps the Mini 5) should be added to Radian6rsquos keywords as they represent the intent of users That these keywords are listed by Google as ldquoBreakoutsrdquo is significant breakouts represent a recent increase in search volume of more than 5000

Radian6

Radian6

Google Insights

Urban Dictionary

14

SummaryCampbell-Ewald has been an active participant in social media since early 2006 Their lead social media researchers Dave Linabury and Jason Macemore were among the first to develop social media monitoring software tools It was through the development of these early tools that were created to meet their own needs as researchers that led to understanding the linguistic challenges raised in this paper

Campbell-Ewaldrsquos Social Media team addressed these linguistic challenges in their own client monitoring projects over the past five years utilizing the following approaches

bull Determining current search trends around a topic

bull Determining the age and gender of the writer

bull Identifying the Influencers and recording their linguistic patterns

bull Identifying emoticons and comparing them to known regional and generational variants

It is critical in monitoring to understand that internal marketing descriptors and paid search terms are not enough to effectively crawl all of the conversations taking place around a brand Nor is it enough to rely on basic tools like Google Alerts Accurate monitoring is done with professional grade tools like Radian6 under the guidance of experienced monitoring teams like those at Campbell-Ewald

The monitor must use the Urban Dictionary to determine any industry or brand slang check Google AdWords for misspellings and current search trends and check Google Insights for regional interest Finally the researcher must either directly contact influential fans of the brand or failing that spend time reading blog posts by influencers and responses to their content from their audience Only then can a keyword set be considered accurate and comprehensive

15

ContactDaveLinabury Group Director Social MediaDaveLinaburyc-ecom

JasonMacemore Digital StrategistJasonMacemorec-ecom

GaryOlson Senior Social Media PlannerGaryOlsonc-ecom

Campbell-Ewald30400 Van Dyke Ave Warren Michigan 48093 +1 (586) 574-3400 httpc-ecom

Page 7: Linguistic challenges associated with monitoring social media

The Problem with Monitoring LanguageLanguages constantly evolve They evolve nationally regionally and hyper-locally For example a popular phrase among teens nationally to describe something amazing is ldquooff the hookrdquo Regional variants such as ldquooff the chainrdquo [Detroit] and ldquooff the heezyrdquo [Brooklyn] exist as well Hyper-locally a neighborhood may have yet another variant shared among friends but not generally known outside that block

This presents unique challenges to the researcher who is using social media monitoring tools If a phrase is known it will be used as a key search term for the tool to use If however more people are using lesser known regional variants the tool loses effectiveness

1337speakThere exist several linguistic phenomenon online that do not exist offline One is the well-known variant known as hacker speak or ldquo1337speakrdquo (Elite speak) This variation goes back more than a decade online It was developed by computer hackers in an effort to make their messages to each other difficult to read by outsiders Words are deconstructed to their visual elements and replaced with alpha-numeric and punctuation equivalents that bear a passing resemblance to the original letter form

For example a capital lsquoTrsquo may be replaced with the number 7 or a + sign The word lsquoatrsquo will be replaced with the sign Capital lsquoErsquo becomes a 3 and so on There is no sequencing to the replacements it is simply a matter of finding letters numbers and punctuation that can be substituted Indeed cleverness is praised and while online ldquo1337speak generatorsrdquo exist which ldquotranslaterdquo text back and forth between English and 1337speak each hacker has her own style of writing and will make personal substitutions that others may or may not choose to adopt

Here is a sample sentence in English first then 1337speak ldquoTime Magazinersquos reporter had no idea what we were afterrdquo ldquo71M3 M464z1n3rsquo5 |23p0|273|2 H4| n0 1|34 wH47 w3 w3|23 4f73|2rdquo

7

If hackers were discussing a new Intel processor in 1337speak no monitoring tool would be able to pick up that conversation as no complete English words exist in hacker speak for the tool to pick up

LOLCATSSites such as General Mayhem 4chanorg and ICANHASCHEEZBURGER are responsible for spreading one of the more popular slang variants known as LOLCATS (pronounced ldquoLAHL catsrdquo) The meme originated as a series of cute pictures of kittens doing things with the accompanying text purported to be the voice of the cat Cats according to the meme have unique spellings of English poor grammar and prefer the ldquoImpactrdquo font Eventually kids began using LOLCATS as an accepted form of writing in text messages instant messages email and even speech

Like 1337speak LOLCATS speak can be difficult if not impossible for monitoring tools to parse as plain English Consider the sentence used for the 1337speak example in English then in LOLCATS

ldquoTime Magazinersquos reporter had no idea what we were afterrdquo ldquoTIEM MAGAZEENZ REPORTR IZ R NO IDEAZ WUT WE R AFTERZrdquo

Intentional MisspellingsFinally Generation Y general do not spell correctly sometimes out of laziness sometimes mdash like hackers mdashto intentionally disguise their messages from authority figures This may not matter to a company monitoring the conversations of senior citizens but if the target audience is the highly sought after 18-24 crowd it is an issue that must be understood Here is a real example found on MySpace from a 16 year-old girl to her friends

ldquoHAY GUISE LOL WUT CHARGIN LAZOR LOLZ SHOOP DA WHOOP THIS KID TOOK MY LUNCH MONEY CALL HIM AND SAY BAD THINGS HERES HIS NUMBER LOLZ 696 696 6969 BUT BECAREFUL HE DOSNT AFRAID OF ANYTHINGrdquo

8

Generational Differences in EmoticonsMicroblogging platforms like Twitter and Foursquare which necessitate short messaging seem almost devoid of emoticons It is our theory that hashtagsmdashshort linked codes preceded by the pound sign ()mdashtake the place of emoticons on microblogging as many hashtags are used sarcastically such as whatever or ilovemylife

There are distinct differences between the types of emoticons created by the different birth generations in the United States Notice that with each generation the ldquofacesrdquo become slightly more realistic

bull The so-called SilentGeneration (1925-1945) are the least likely to use emoticons in speech other than the most basic (smiles and frowns)

bull The BabyBoomers (1946-1963) use emoticons sparingly but nevertheless use more than just -) and -( symbols They will include others such as - (unsure) -O (surprised) and -) (wink) Notice the addition of a nose formed with the hyphen key

bull GenerationXers (1964-1980) use the most emoticons of the older three generations They include unusual emoticons such as gt--(deggt (dead fish) and ^p (sticking out tongue) even emoticons meant sexually such as (o) for breasts Noses are often present usually with a carat ^ in place of a hyphen although hyphens are prevalent as well

bull It is with GenerationY (1981-2000) that we see the greatest change in emoticons where the ldquofacesrdquo move from sideways to forward facing taken from the Japanese kaomoji Compare the symbol for wink between Generations X and Y ^) and (0_-)

9

Silent Generation Wink

)Baby Boomer Wink

-)Generation X Wink

^)Generation Y Wink

(O_-)

Gender Analysis Gender Analysis may be unfamiliar to most and many may question even why it is necessary The reason is simple Comments may arise where either the screen name of the writer is ambiguous or the writing style of a known individual seems to drastically change suddenly In the latter case there is the distinct possibility of profile fraud

Some individuals may pretend to be the opposite gender for various reasons to pretend to be another person for a prank to assume the identity of another for fraudulent reasons to pretend to be the opposite gender for sexual reasons to pretend to be another for undercover work as in vice-squad or detective work

SolutionsRelying solely on internal industry and marketing keywords will not suffice It is crucial to take additional steps

The following sources be used to determine additional relevant keywords

bull TheUrbanDictionary httpurbandictionarycom Continually updated the Urban Dictionary is easily the largest source of regional national and international slang on the Internet Excellent for typing in industry terms to see if variations exist and regionally where they are used

bull GoogleInsightshttpgooglecominsightssearch Google Insights allow searches to go from global down to individual cities with timeframes from the last 30 days as far back as 2004 They provide trends on rising search patterns based on the root key term maps indicating geo-density forecasts and news headlines plotted on trend lines

10

bull GoogleAdWords httpadwordsgooglecom AdWords is a free tool from Google designed to assist companies in making better choices when selecting keywords for paid search buys The tool can also be used to help select better keywords for social media monitoring Keywords are shown by the latest search patterns with search quantities displayed

bull InfluencersAsk active and influential customers for terms nicknames etc If your company does not have a personal relationship with its influencers find and read their blogs and tweets paying close attention to the responses from their audiences Flag unusual words spellings and abbreviations

bull GenderGeniehttp bookblognetgendergeniephp A free tool that can identify the gender of the writer by pasting text into a field and running the algorithm

With these additional keywords misspellings slang nicknames and regional variants the new keyword list will not only yield more data but will finally tell the whole consumer story surrounding the brand

BenefitsIt is no longer an option to be naiumlve enough to actually believe that no one is talking All brands are being discussed by someone Only through the proper configuration of professional-grade monitoring tools like Radian6mdashand preferably under the guidance of a social media agency that specializes in monitoring and analysismdashcan a company expect to truly know what is being said about their brand

Not knowing how your brand is being discussed and described means that brand is not getting the entire picture as is the case with the reports from PR agencies and most internal social media monitoring

By applying these techniques and using these additional tools a brand can be certain of seeing the full picture and glean far more learnings from their customer base

11

Case Study Chevrolet CobaltThe phenomenon of linguistic variants was first noticed and described by Linabury and Macemore in 2007 to General Motors while they were monitoring conversations pertaining to the Chevrolet Cobaltmdasha small car that young males were customizingmdashalong with Honda Accordsmdashinto street rods (known regionally as Rice Rods Rice Burners Rice Rockets etc) The assignment was to find out what these young men were saying about the Cobalt as they were deemed by Chevrolet to be influencers to non-Chevrolet owners

Campbell-Ewaldrsquos monitoring was confined geographically to the Great Lakes states During the course of the monitoring Macemore noticed that some of the Chicago and Ohio conversations in forums were referring to the Cobalt as a ldquoBaltrdquo Linabury noticed that conversations on the West side of Michigan referred to it as a ldquoC-Carrdquo or ldquoC-Baltrdquo C-Car was the internal name of the vehicle used by engineers but in Michigan (where the car is produced) it is possible that engineering names are known externally

Macemore then theorized that these terms were surfac-ing enough that they should be added to the keywords the monitoring tool was using to spider conversations After adding the new terms the number of conversations found by the tool increased by 53 This led to speculation that theinfluentialmembersofasocialcirclemaybemorelikelytohaveinternalnicknamesthanthoseoutsidethatcircle and that these names needed to be identified at the outset of any social media monitoring assignment to en-sure accurate monitoring and the largest possible data set

Result By adding the additional terms that were manually identified the conversational data set increased by more than 50 and the client gained insight and learnings into how their vehicles were referred to by the most influential purchasers of their product

12

Case Study OnStartradeOnStartrade is a multimillion dollar company that produces a telematics system for vehicles As the system is responsible for saving the lives of hundreds of people involved in motor vehicle accidents OnStartradersquos corporate marketing team wanted up to the minute reports on what their subscribers were saying their detractors and the media In 2007 OnStartrade hired Campbell-Ewaldrsquos Social Media Team to monitor conversations and report back with weekly findings and daily with any outstanding conversations or topics

Campbell-Ewaldrsquos Social Media Team quickly discovered there would be a few barriers to accurate monitoring For example people discussing certain television shows were appearing in the feed Sentences like ldquoDid you see what happened onStar Search last nightrdquo or ldquoThere was one episode onStar Trek wherehelliprdquo These false positives were quickly weeded out through exclusionary phrases added to the keyword set

The team also discovered linguistic variants of OnStartrade appearing in the conversations of loyal fans and influencers which included several hackers Some hackers were tweaking OnStartrade at home (similar to the jail-breaking of iPhones) for fun We found that they used numerous variants of OnStartrade including On On Star On_Star NOnStar ONStar OffStar On-Star OnsStar and BlondeStar (in reference to a YouTube parody of OnStartrade)

Result By adding the additional terms that were manually identified the conversational data set increased by more than 109 and the client gained insight and learnings into how OnStar was being referred to by the most influential purchasers of their product and by an unexpected fan base hackers

13

Technical SpecsAssigning new keywords to any social monitoring tool is simple Finding the keywords is the challenge The following demonstration shows how to add new keywords to an existing set using the popular social media monitoring tool Radian6

In this example the new Dell Mini 3 cellphone has been chosen as a topic to monitor Narrowing the feed to cell phones and removing ldquonoiserdquo about Dell laptops makes the results more accurate

By adding the keyword lsquocellphonersquo and the exclusionary keyword lsquolaptoprsquo the feed examples are more targeted

A search on Google Insights for lsquoDell Mini 3rsquo shows us that consumers are also searching for it as a lsquocellular dellrsquo lsquodell androidrsquo lsquodell android phonersquo lsquodell smartphonersquo and lsquodell mini 5rsquo (a different model)

A look at the Urban Dictionary indicates any cellphone may be referred to as a ldquocellierdquo by youth

These additional keywords (except perhaps the Mini 5) should be added to Radian6rsquos keywords as they represent the intent of users That these keywords are listed by Google as ldquoBreakoutsrdquo is significant breakouts represent a recent increase in search volume of more than 5000

Radian6

Radian6

Google Insights

Urban Dictionary

14

SummaryCampbell-Ewald has been an active participant in social media since early 2006 Their lead social media researchers Dave Linabury and Jason Macemore were among the first to develop social media monitoring software tools It was through the development of these early tools that were created to meet their own needs as researchers that led to understanding the linguistic challenges raised in this paper

Campbell-Ewaldrsquos Social Media team addressed these linguistic challenges in their own client monitoring projects over the past five years utilizing the following approaches

bull Determining current search trends around a topic

bull Determining the age and gender of the writer

bull Identifying the Influencers and recording their linguistic patterns

bull Identifying emoticons and comparing them to known regional and generational variants

It is critical in monitoring to understand that internal marketing descriptors and paid search terms are not enough to effectively crawl all of the conversations taking place around a brand Nor is it enough to rely on basic tools like Google Alerts Accurate monitoring is done with professional grade tools like Radian6 under the guidance of experienced monitoring teams like those at Campbell-Ewald

The monitor must use the Urban Dictionary to determine any industry or brand slang check Google AdWords for misspellings and current search trends and check Google Insights for regional interest Finally the researcher must either directly contact influential fans of the brand or failing that spend time reading blog posts by influencers and responses to their content from their audience Only then can a keyword set be considered accurate and comprehensive

15

ContactDaveLinabury Group Director Social MediaDaveLinaburyc-ecom

JasonMacemore Digital StrategistJasonMacemorec-ecom

GaryOlson Senior Social Media PlannerGaryOlsonc-ecom

Campbell-Ewald30400 Van Dyke Ave Warren Michigan 48093 +1 (586) 574-3400 httpc-ecom

Page 8: Linguistic challenges associated with monitoring social media

If hackers were discussing a new Intel processor in 1337speak no monitoring tool would be able to pick up that conversation as no complete English words exist in hacker speak for the tool to pick up

LOLCATSSites such as General Mayhem 4chanorg and ICANHASCHEEZBURGER are responsible for spreading one of the more popular slang variants known as LOLCATS (pronounced ldquoLAHL catsrdquo) The meme originated as a series of cute pictures of kittens doing things with the accompanying text purported to be the voice of the cat Cats according to the meme have unique spellings of English poor grammar and prefer the ldquoImpactrdquo font Eventually kids began using LOLCATS as an accepted form of writing in text messages instant messages email and even speech

Like 1337speak LOLCATS speak can be difficult if not impossible for monitoring tools to parse as plain English Consider the sentence used for the 1337speak example in English then in LOLCATS

ldquoTime Magazinersquos reporter had no idea what we were afterrdquo ldquoTIEM MAGAZEENZ REPORTR IZ R NO IDEAZ WUT WE R AFTERZrdquo

Intentional MisspellingsFinally Generation Y general do not spell correctly sometimes out of laziness sometimes mdash like hackers mdashto intentionally disguise their messages from authority figures This may not matter to a company monitoring the conversations of senior citizens but if the target audience is the highly sought after 18-24 crowd it is an issue that must be understood Here is a real example found on MySpace from a 16 year-old girl to her friends

ldquoHAY GUISE LOL WUT CHARGIN LAZOR LOLZ SHOOP DA WHOOP THIS KID TOOK MY LUNCH MONEY CALL HIM AND SAY BAD THINGS HERES HIS NUMBER LOLZ 696 696 6969 BUT BECAREFUL HE DOSNT AFRAID OF ANYTHINGrdquo

8

Generational Differences in EmoticonsMicroblogging platforms like Twitter and Foursquare which necessitate short messaging seem almost devoid of emoticons It is our theory that hashtagsmdashshort linked codes preceded by the pound sign ()mdashtake the place of emoticons on microblogging as many hashtags are used sarcastically such as whatever or ilovemylife

There are distinct differences between the types of emoticons created by the different birth generations in the United States Notice that with each generation the ldquofacesrdquo become slightly more realistic

bull The so-called SilentGeneration (1925-1945) are the least likely to use emoticons in speech other than the most basic (smiles and frowns)

bull The BabyBoomers (1946-1963) use emoticons sparingly but nevertheless use more than just -) and -( symbols They will include others such as - (unsure) -O (surprised) and -) (wink) Notice the addition of a nose formed with the hyphen key

bull GenerationXers (1964-1980) use the most emoticons of the older three generations They include unusual emoticons such as gt--(deggt (dead fish) and ^p (sticking out tongue) even emoticons meant sexually such as (o) for breasts Noses are often present usually with a carat ^ in place of a hyphen although hyphens are prevalent as well

bull It is with GenerationY (1981-2000) that we see the greatest change in emoticons where the ldquofacesrdquo move from sideways to forward facing taken from the Japanese kaomoji Compare the symbol for wink between Generations X and Y ^) and (0_-)

9

Silent Generation Wink

)Baby Boomer Wink

-)Generation X Wink

^)Generation Y Wink

(O_-)

Gender Analysis Gender Analysis may be unfamiliar to most and many may question even why it is necessary The reason is simple Comments may arise where either the screen name of the writer is ambiguous or the writing style of a known individual seems to drastically change suddenly In the latter case there is the distinct possibility of profile fraud

Some individuals may pretend to be the opposite gender for various reasons to pretend to be another person for a prank to assume the identity of another for fraudulent reasons to pretend to be the opposite gender for sexual reasons to pretend to be another for undercover work as in vice-squad or detective work

SolutionsRelying solely on internal industry and marketing keywords will not suffice It is crucial to take additional steps

The following sources be used to determine additional relevant keywords

bull TheUrbanDictionary httpurbandictionarycom Continually updated the Urban Dictionary is easily the largest source of regional national and international slang on the Internet Excellent for typing in industry terms to see if variations exist and regionally where they are used

bull GoogleInsightshttpgooglecominsightssearch Google Insights allow searches to go from global down to individual cities with timeframes from the last 30 days as far back as 2004 They provide trends on rising search patterns based on the root key term maps indicating geo-density forecasts and news headlines plotted on trend lines

10

bull GoogleAdWords httpadwordsgooglecom AdWords is a free tool from Google designed to assist companies in making better choices when selecting keywords for paid search buys The tool can also be used to help select better keywords for social media monitoring Keywords are shown by the latest search patterns with search quantities displayed

bull InfluencersAsk active and influential customers for terms nicknames etc If your company does not have a personal relationship with its influencers find and read their blogs and tweets paying close attention to the responses from their audiences Flag unusual words spellings and abbreviations

bull GenderGeniehttp bookblognetgendergeniephp A free tool that can identify the gender of the writer by pasting text into a field and running the algorithm

With these additional keywords misspellings slang nicknames and regional variants the new keyword list will not only yield more data but will finally tell the whole consumer story surrounding the brand

BenefitsIt is no longer an option to be naiumlve enough to actually believe that no one is talking All brands are being discussed by someone Only through the proper configuration of professional-grade monitoring tools like Radian6mdashand preferably under the guidance of a social media agency that specializes in monitoring and analysismdashcan a company expect to truly know what is being said about their brand

Not knowing how your brand is being discussed and described means that brand is not getting the entire picture as is the case with the reports from PR agencies and most internal social media monitoring

By applying these techniques and using these additional tools a brand can be certain of seeing the full picture and glean far more learnings from their customer base

11

Case Study Chevrolet CobaltThe phenomenon of linguistic variants was first noticed and described by Linabury and Macemore in 2007 to General Motors while they were monitoring conversations pertaining to the Chevrolet Cobaltmdasha small car that young males were customizingmdashalong with Honda Accordsmdashinto street rods (known regionally as Rice Rods Rice Burners Rice Rockets etc) The assignment was to find out what these young men were saying about the Cobalt as they were deemed by Chevrolet to be influencers to non-Chevrolet owners

Campbell-Ewaldrsquos monitoring was confined geographically to the Great Lakes states During the course of the monitoring Macemore noticed that some of the Chicago and Ohio conversations in forums were referring to the Cobalt as a ldquoBaltrdquo Linabury noticed that conversations on the West side of Michigan referred to it as a ldquoC-Carrdquo or ldquoC-Baltrdquo C-Car was the internal name of the vehicle used by engineers but in Michigan (where the car is produced) it is possible that engineering names are known externally

Macemore then theorized that these terms were surfac-ing enough that they should be added to the keywords the monitoring tool was using to spider conversations After adding the new terms the number of conversations found by the tool increased by 53 This led to speculation that theinfluentialmembersofasocialcirclemaybemorelikelytohaveinternalnicknamesthanthoseoutsidethatcircle and that these names needed to be identified at the outset of any social media monitoring assignment to en-sure accurate monitoring and the largest possible data set

Result By adding the additional terms that were manually identified the conversational data set increased by more than 50 and the client gained insight and learnings into how their vehicles were referred to by the most influential purchasers of their product

12

Case Study OnStartradeOnStartrade is a multimillion dollar company that produces a telematics system for vehicles As the system is responsible for saving the lives of hundreds of people involved in motor vehicle accidents OnStartradersquos corporate marketing team wanted up to the minute reports on what their subscribers were saying their detractors and the media In 2007 OnStartrade hired Campbell-Ewaldrsquos Social Media Team to monitor conversations and report back with weekly findings and daily with any outstanding conversations or topics

Campbell-Ewaldrsquos Social Media Team quickly discovered there would be a few barriers to accurate monitoring For example people discussing certain television shows were appearing in the feed Sentences like ldquoDid you see what happened onStar Search last nightrdquo or ldquoThere was one episode onStar Trek wherehelliprdquo These false positives were quickly weeded out through exclusionary phrases added to the keyword set

The team also discovered linguistic variants of OnStartrade appearing in the conversations of loyal fans and influencers which included several hackers Some hackers were tweaking OnStartrade at home (similar to the jail-breaking of iPhones) for fun We found that they used numerous variants of OnStartrade including On On Star On_Star NOnStar ONStar OffStar On-Star OnsStar and BlondeStar (in reference to a YouTube parody of OnStartrade)

Result By adding the additional terms that were manually identified the conversational data set increased by more than 109 and the client gained insight and learnings into how OnStar was being referred to by the most influential purchasers of their product and by an unexpected fan base hackers

13

Technical SpecsAssigning new keywords to any social monitoring tool is simple Finding the keywords is the challenge The following demonstration shows how to add new keywords to an existing set using the popular social media monitoring tool Radian6

In this example the new Dell Mini 3 cellphone has been chosen as a topic to monitor Narrowing the feed to cell phones and removing ldquonoiserdquo about Dell laptops makes the results more accurate

By adding the keyword lsquocellphonersquo and the exclusionary keyword lsquolaptoprsquo the feed examples are more targeted

A search on Google Insights for lsquoDell Mini 3rsquo shows us that consumers are also searching for it as a lsquocellular dellrsquo lsquodell androidrsquo lsquodell android phonersquo lsquodell smartphonersquo and lsquodell mini 5rsquo (a different model)

A look at the Urban Dictionary indicates any cellphone may be referred to as a ldquocellierdquo by youth

These additional keywords (except perhaps the Mini 5) should be added to Radian6rsquos keywords as they represent the intent of users That these keywords are listed by Google as ldquoBreakoutsrdquo is significant breakouts represent a recent increase in search volume of more than 5000

Radian6

Radian6

Google Insights

Urban Dictionary

14

SummaryCampbell-Ewald has been an active participant in social media since early 2006 Their lead social media researchers Dave Linabury and Jason Macemore were among the first to develop social media monitoring software tools It was through the development of these early tools that were created to meet their own needs as researchers that led to understanding the linguistic challenges raised in this paper

Campbell-Ewaldrsquos Social Media team addressed these linguistic challenges in their own client monitoring projects over the past five years utilizing the following approaches

bull Determining current search trends around a topic

bull Determining the age and gender of the writer

bull Identifying the Influencers and recording their linguistic patterns

bull Identifying emoticons and comparing them to known regional and generational variants

It is critical in monitoring to understand that internal marketing descriptors and paid search terms are not enough to effectively crawl all of the conversations taking place around a brand Nor is it enough to rely on basic tools like Google Alerts Accurate monitoring is done with professional grade tools like Radian6 under the guidance of experienced monitoring teams like those at Campbell-Ewald

The monitor must use the Urban Dictionary to determine any industry or brand slang check Google AdWords for misspellings and current search trends and check Google Insights for regional interest Finally the researcher must either directly contact influential fans of the brand or failing that spend time reading blog posts by influencers and responses to their content from their audience Only then can a keyword set be considered accurate and comprehensive

15

ContactDaveLinabury Group Director Social MediaDaveLinaburyc-ecom

JasonMacemore Digital StrategistJasonMacemorec-ecom

GaryOlson Senior Social Media PlannerGaryOlsonc-ecom

Campbell-Ewald30400 Van Dyke Ave Warren Michigan 48093 +1 (586) 574-3400 httpc-ecom

Page 9: Linguistic challenges associated with monitoring social media

Generational Differences in EmoticonsMicroblogging platforms like Twitter and Foursquare which necessitate short messaging seem almost devoid of emoticons It is our theory that hashtagsmdashshort linked codes preceded by the pound sign ()mdashtake the place of emoticons on microblogging as many hashtags are used sarcastically such as whatever or ilovemylife

There are distinct differences between the types of emoticons created by the different birth generations in the United States Notice that with each generation the ldquofacesrdquo become slightly more realistic

bull The so-called SilentGeneration (1925-1945) are the least likely to use emoticons in speech other than the most basic (smiles and frowns)

bull The BabyBoomers (1946-1963) use emoticons sparingly but nevertheless use more than just -) and -( symbols They will include others such as - (unsure) -O (surprised) and -) (wink) Notice the addition of a nose formed with the hyphen key

bull GenerationXers (1964-1980) use the most emoticons of the older three generations They include unusual emoticons such as gt--(deggt (dead fish) and ^p (sticking out tongue) even emoticons meant sexually such as (o) for breasts Noses are often present usually with a carat ^ in place of a hyphen although hyphens are prevalent as well

bull It is with GenerationY (1981-2000) that we see the greatest change in emoticons where the ldquofacesrdquo move from sideways to forward facing taken from the Japanese kaomoji Compare the symbol for wink between Generations X and Y ^) and (0_-)

9

Silent Generation Wink

)Baby Boomer Wink

-)Generation X Wink

^)Generation Y Wink

(O_-)

Gender Analysis Gender Analysis may be unfamiliar to most and many may question even why it is necessary The reason is simple Comments may arise where either the screen name of the writer is ambiguous or the writing style of a known individual seems to drastically change suddenly In the latter case there is the distinct possibility of profile fraud

Some individuals may pretend to be the opposite gender for various reasons to pretend to be another person for a prank to assume the identity of another for fraudulent reasons to pretend to be the opposite gender for sexual reasons to pretend to be another for undercover work as in vice-squad or detective work

SolutionsRelying solely on internal industry and marketing keywords will not suffice It is crucial to take additional steps

The following sources be used to determine additional relevant keywords

bull TheUrbanDictionary httpurbandictionarycom Continually updated the Urban Dictionary is easily the largest source of regional national and international slang on the Internet Excellent for typing in industry terms to see if variations exist and regionally where they are used

bull GoogleInsightshttpgooglecominsightssearch Google Insights allow searches to go from global down to individual cities with timeframes from the last 30 days as far back as 2004 They provide trends on rising search patterns based on the root key term maps indicating geo-density forecasts and news headlines plotted on trend lines

10

bull GoogleAdWords httpadwordsgooglecom AdWords is a free tool from Google designed to assist companies in making better choices when selecting keywords for paid search buys The tool can also be used to help select better keywords for social media monitoring Keywords are shown by the latest search patterns with search quantities displayed

bull InfluencersAsk active and influential customers for terms nicknames etc If your company does not have a personal relationship with its influencers find and read their blogs and tweets paying close attention to the responses from their audiences Flag unusual words spellings and abbreviations

bull GenderGeniehttp bookblognetgendergeniephp A free tool that can identify the gender of the writer by pasting text into a field and running the algorithm

With these additional keywords misspellings slang nicknames and regional variants the new keyword list will not only yield more data but will finally tell the whole consumer story surrounding the brand

BenefitsIt is no longer an option to be naiumlve enough to actually believe that no one is talking All brands are being discussed by someone Only through the proper configuration of professional-grade monitoring tools like Radian6mdashand preferably under the guidance of a social media agency that specializes in monitoring and analysismdashcan a company expect to truly know what is being said about their brand

Not knowing how your brand is being discussed and described means that brand is not getting the entire picture as is the case with the reports from PR agencies and most internal social media monitoring

By applying these techniques and using these additional tools a brand can be certain of seeing the full picture and glean far more learnings from their customer base

11

Case Study Chevrolet CobaltThe phenomenon of linguistic variants was first noticed and described by Linabury and Macemore in 2007 to General Motors while they were monitoring conversations pertaining to the Chevrolet Cobaltmdasha small car that young males were customizingmdashalong with Honda Accordsmdashinto street rods (known regionally as Rice Rods Rice Burners Rice Rockets etc) The assignment was to find out what these young men were saying about the Cobalt as they were deemed by Chevrolet to be influencers to non-Chevrolet owners

Campbell-Ewaldrsquos monitoring was confined geographically to the Great Lakes states During the course of the monitoring Macemore noticed that some of the Chicago and Ohio conversations in forums were referring to the Cobalt as a ldquoBaltrdquo Linabury noticed that conversations on the West side of Michigan referred to it as a ldquoC-Carrdquo or ldquoC-Baltrdquo C-Car was the internal name of the vehicle used by engineers but in Michigan (where the car is produced) it is possible that engineering names are known externally

Macemore then theorized that these terms were surfac-ing enough that they should be added to the keywords the monitoring tool was using to spider conversations After adding the new terms the number of conversations found by the tool increased by 53 This led to speculation that theinfluentialmembersofasocialcirclemaybemorelikelytohaveinternalnicknamesthanthoseoutsidethatcircle and that these names needed to be identified at the outset of any social media monitoring assignment to en-sure accurate monitoring and the largest possible data set

Result By adding the additional terms that were manually identified the conversational data set increased by more than 50 and the client gained insight and learnings into how their vehicles were referred to by the most influential purchasers of their product

12

Case Study OnStartradeOnStartrade is a multimillion dollar company that produces a telematics system for vehicles As the system is responsible for saving the lives of hundreds of people involved in motor vehicle accidents OnStartradersquos corporate marketing team wanted up to the minute reports on what their subscribers were saying their detractors and the media In 2007 OnStartrade hired Campbell-Ewaldrsquos Social Media Team to monitor conversations and report back with weekly findings and daily with any outstanding conversations or topics

Campbell-Ewaldrsquos Social Media Team quickly discovered there would be a few barriers to accurate monitoring For example people discussing certain television shows were appearing in the feed Sentences like ldquoDid you see what happened onStar Search last nightrdquo or ldquoThere was one episode onStar Trek wherehelliprdquo These false positives were quickly weeded out through exclusionary phrases added to the keyword set

The team also discovered linguistic variants of OnStartrade appearing in the conversations of loyal fans and influencers which included several hackers Some hackers were tweaking OnStartrade at home (similar to the jail-breaking of iPhones) for fun We found that they used numerous variants of OnStartrade including On On Star On_Star NOnStar ONStar OffStar On-Star OnsStar and BlondeStar (in reference to a YouTube parody of OnStartrade)

Result By adding the additional terms that were manually identified the conversational data set increased by more than 109 and the client gained insight and learnings into how OnStar was being referred to by the most influential purchasers of their product and by an unexpected fan base hackers

13

Technical SpecsAssigning new keywords to any social monitoring tool is simple Finding the keywords is the challenge The following demonstration shows how to add new keywords to an existing set using the popular social media monitoring tool Radian6

In this example the new Dell Mini 3 cellphone has been chosen as a topic to monitor Narrowing the feed to cell phones and removing ldquonoiserdquo about Dell laptops makes the results more accurate

By adding the keyword lsquocellphonersquo and the exclusionary keyword lsquolaptoprsquo the feed examples are more targeted

A search on Google Insights for lsquoDell Mini 3rsquo shows us that consumers are also searching for it as a lsquocellular dellrsquo lsquodell androidrsquo lsquodell android phonersquo lsquodell smartphonersquo and lsquodell mini 5rsquo (a different model)

A look at the Urban Dictionary indicates any cellphone may be referred to as a ldquocellierdquo by youth

These additional keywords (except perhaps the Mini 5) should be added to Radian6rsquos keywords as they represent the intent of users That these keywords are listed by Google as ldquoBreakoutsrdquo is significant breakouts represent a recent increase in search volume of more than 5000

Radian6

Radian6

Google Insights

Urban Dictionary

14

SummaryCampbell-Ewald has been an active participant in social media since early 2006 Their lead social media researchers Dave Linabury and Jason Macemore were among the first to develop social media monitoring software tools It was through the development of these early tools that were created to meet their own needs as researchers that led to understanding the linguistic challenges raised in this paper

Campbell-Ewaldrsquos Social Media team addressed these linguistic challenges in their own client monitoring projects over the past five years utilizing the following approaches

bull Determining current search trends around a topic

bull Determining the age and gender of the writer

bull Identifying the Influencers and recording their linguistic patterns

bull Identifying emoticons and comparing them to known regional and generational variants

It is critical in monitoring to understand that internal marketing descriptors and paid search terms are not enough to effectively crawl all of the conversations taking place around a brand Nor is it enough to rely on basic tools like Google Alerts Accurate monitoring is done with professional grade tools like Radian6 under the guidance of experienced monitoring teams like those at Campbell-Ewald

The monitor must use the Urban Dictionary to determine any industry or brand slang check Google AdWords for misspellings and current search trends and check Google Insights for regional interest Finally the researcher must either directly contact influential fans of the brand or failing that spend time reading blog posts by influencers and responses to their content from their audience Only then can a keyword set be considered accurate and comprehensive

15

ContactDaveLinabury Group Director Social MediaDaveLinaburyc-ecom

JasonMacemore Digital StrategistJasonMacemorec-ecom

GaryOlson Senior Social Media PlannerGaryOlsonc-ecom

Campbell-Ewald30400 Van Dyke Ave Warren Michigan 48093 +1 (586) 574-3400 httpc-ecom

Page 10: Linguistic challenges associated with monitoring social media

Gender Analysis Gender Analysis may be unfamiliar to most and many may question even why it is necessary The reason is simple Comments may arise where either the screen name of the writer is ambiguous or the writing style of a known individual seems to drastically change suddenly In the latter case there is the distinct possibility of profile fraud

Some individuals may pretend to be the opposite gender for various reasons to pretend to be another person for a prank to assume the identity of another for fraudulent reasons to pretend to be the opposite gender for sexual reasons to pretend to be another for undercover work as in vice-squad or detective work

SolutionsRelying solely on internal industry and marketing keywords will not suffice It is crucial to take additional steps

The following sources be used to determine additional relevant keywords

bull TheUrbanDictionary httpurbandictionarycom Continually updated the Urban Dictionary is easily the largest source of regional national and international slang on the Internet Excellent for typing in industry terms to see if variations exist and regionally where they are used

bull GoogleInsightshttpgooglecominsightssearch Google Insights allow searches to go from global down to individual cities with timeframes from the last 30 days as far back as 2004 They provide trends on rising search patterns based on the root key term maps indicating geo-density forecasts and news headlines plotted on trend lines

10

bull GoogleAdWords httpadwordsgooglecom AdWords is a free tool from Google designed to assist companies in making better choices when selecting keywords for paid search buys The tool can also be used to help select better keywords for social media monitoring Keywords are shown by the latest search patterns with search quantities displayed

bull InfluencersAsk active and influential customers for terms nicknames etc If your company does not have a personal relationship with its influencers find and read their blogs and tweets paying close attention to the responses from their audiences Flag unusual words spellings and abbreviations

bull GenderGeniehttp bookblognetgendergeniephp A free tool that can identify the gender of the writer by pasting text into a field and running the algorithm

With these additional keywords misspellings slang nicknames and regional variants the new keyword list will not only yield more data but will finally tell the whole consumer story surrounding the brand

BenefitsIt is no longer an option to be naiumlve enough to actually believe that no one is talking All brands are being discussed by someone Only through the proper configuration of professional-grade monitoring tools like Radian6mdashand preferably under the guidance of a social media agency that specializes in monitoring and analysismdashcan a company expect to truly know what is being said about their brand

Not knowing how your brand is being discussed and described means that brand is not getting the entire picture as is the case with the reports from PR agencies and most internal social media monitoring

By applying these techniques and using these additional tools a brand can be certain of seeing the full picture and glean far more learnings from their customer base

11

Case Study Chevrolet CobaltThe phenomenon of linguistic variants was first noticed and described by Linabury and Macemore in 2007 to General Motors while they were monitoring conversations pertaining to the Chevrolet Cobaltmdasha small car that young males were customizingmdashalong with Honda Accordsmdashinto street rods (known regionally as Rice Rods Rice Burners Rice Rockets etc) The assignment was to find out what these young men were saying about the Cobalt as they were deemed by Chevrolet to be influencers to non-Chevrolet owners

Campbell-Ewaldrsquos monitoring was confined geographically to the Great Lakes states During the course of the monitoring Macemore noticed that some of the Chicago and Ohio conversations in forums were referring to the Cobalt as a ldquoBaltrdquo Linabury noticed that conversations on the West side of Michigan referred to it as a ldquoC-Carrdquo or ldquoC-Baltrdquo C-Car was the internal name of the vehicle used by engineers but in Michigan (where the car is produced) it is possible that engineering names are known externally

Macemore then theorized that these terms were surfac-ing enough that they should be added to the keywords the monitoring tool was using to spider conversations After adding the new terms the number of conversations found by the tool increased by 53 This led to speculation that theinfluentialmembersofasocialcirclemaybemorelikelytohaveinternalnicknamesthanthoseoutsidethatcircle and that these names needed to be identified at the outset of any social media monitoring assignment to en-sure accurate monitoring and the largest possible data set

Result By adding the additional terms that were manually identified the conversational data set increased by more than 50 and the client gained insight and learnings into how their vehicles were referred to by the most influential purchasers of their product

12

Case Study OnStartradeOnStartrade is a multimillion dollar company that produces a telematics system for vehicles As the system is responsible for saving the lives of hundreds of people involved in motor vehicle accidents OnStartradersquos corporate marketing team wanted up to the minute reports on what their subscribers were saying their detractors and the media In 2007 OnStartrade hired Campbell-Ewaldrsquos Social Media Team to monitor conversations and report back with weekly findings and daily with any outstanding conversations or topics

Campbell-Ewaldrsquos Social Media Team quickly discovered there would be a few barriers to accurate monitoring For example people discussing certain television shows were appearing in the feed Sentences like ldquoDid you see what happened onStar Search last nightrdquo or ldquoThere was one episode onStar Trek wherehelliprdquo These false positives were quickly weeded out through exclusionary phrases added to the keyword set

The team also discovered linguistic variants of OnStartrade appearing in the conversations of loyal fans and influencers which included several hackers Some hackers were tweaking OnStartrade at home (similar to the jail-breaking of iPhones) for fun We found that they used numerous variants of OnStartrade including On On Star On_Star NOnStar ONStar OffStar On-Star OnsStar and BlondeStar (in reference to a YouTube parody of OnStartrade)

Result By adding the additional terms that were manually identified the conversational data set increased by more than 109 and the client gained insight and learnings into how OnStar was being referred to by the most influential purchasers of their product and by an unexpected fan base hackers

13

Technical SpecsAssigning new keywords to any social monitoring tool is simple Finding the keywords is the challenge The following demonstration shows how to add new keywords to an existing set using the popular social media monitoring tool Radian6

In this example the new Dell Mini 3 cellphone has been chosen as a topic to monitor Narrowing the feed to cell phones and removing ldquonoiserdquo about Dell laptops makes the results more accurate

By adding the keyword lsquocellphonersquo and the exclusionary keyword lsquolaptoprsquo the feed examples are more targeted

A search on Google Insights for lsquoDell Mini 3rsquo shows us that consumers are also searching for it as a lsquocellular dellrsquo lsquodell androidrsquo lsquodell android phonersquo lsquodell smartphonersquo and lsquodell mini 5rsquo (a different model)

A look at the Urban Dictionary indicates any cellphone may be referred to as a ldquocellierdquo by youth

These additional keywords (except perhaps the Mini 5) should be added to Radian6rsquos keywords as they represent the intent of users That these keywords are listed by Google as ldquoBreakoutsrdquo is significant breakouts represent a recent increase in search volume of more than 5000

Radian6

Radian6

Google Insights

Urban Dictionary

14

SummaryCampbell-Ewald has been an active participant in social media since early 2006 Their lead social media researchers Dave Linabury and Jason Macemore were among the first to develop social media monitoring software tools It was through the development of these early tools that were created to meet their own needs as researchers that led to understanding the linguistic challenges raised in this paper

Campbell-Ewaldrsquos Social Media team addressed these linguistic challenges in their own client monitoring projects over the past five years utilizing the following approaches

bull Determining current search trends around a topic

bull Determining the age and gender of the writer

bull Identifying the Influencers and recording their linguistic patterns

bull Identifying emoticons and comparing them to known regional and generational variants

It is critical in monitoring to understand that internal marketing descriptors and paid search terms are not enough to effectively crawl all of the conversations taking place around a brand Nor is it enough to rely on basic tools like Google Alerts Accurate monitoring is done with professional grade tools like Radian6 under the guidance of experienced monitoring teams like those at Campbell-Ewald

The monitor must use the Urban Dictionary to determine any industry or brand slang check Google AdWords for misspellings and current search trends and check Google Insights for regional interest Finally the researcher must either directly contact influential fans of the brand or failing that spend time reading blog posts by influencers and responses to their content from their audience Only then can a keyword set be considered accurate and comprehensive

15

ContactDaveLinabury Group Director Social MediaDaveLinaburyc-ecom

JasonMacemore Digital StrategistJasonMacemorec-ecom

GaryOlson Senior Social Media PlannerGaryOlsonc-ecom

Campbell-Ewald30400 Van Dyke Ave Warren Michigan 48093 +1 (586) 574-3400 httpc-ecom

Page 11: Linguistic challenges associated with monitoring social media

bull GoogleAdWords httpadwordsgooglecom AdWords is a free tool from Google designed to assist companies in making better choices when selecting keywords for paid search buys The tool can also be used to help select better keywords for social media monitoring Keywords are shown by the latest search patterns with search quantities displayed

bull InfluencersAsk active and influential customers for terms nicknames etc If your company does not have a personal relationship with its influencers find and read their blogs and tweets paying close attention to the responses from their audiences Flag unusual words spellings and abbreviations

bull GenderGeniehttp bookblognetgendergeniephp A free tool that can identify the gender of the writer by pasting text into a field and running the algorithm

With these additional keywords misspellings slang nicknames and regional variants the new keyword list will not only yield more data but will finally tell the whole consumer story surrounding the brand

BenefitsIt is no longer an option to be naiumlve enough to actually believe that no one is talking All brands are being discussed by someone Only through the proper configuration of professional-grade monitoring tools like Radian6mdashand preferably under the guidance of a social media agency that specializes in monitoring and analysismdashcan a company expect to truly know what is being said about their brand

Not knowing how your brand is being discussed and described means that brand is not getting the entire picture as is the case with the reports from PR agencies and most internal social media monitoring

By applying these techniques and using these additional tools a brand can be certain of seeing the full picture and glean far more learnings from their customer base

11

Case Study Chevrolet CobaltThe phenomenon of linguistic variants was first noticed and described by Linabury and Macemore in 2007 to General Motors while they were monitoring conversations pertaining to the Chevrolet Cobaltmdasha small car that young males were customizingmdashalong with Honda Accordsmdashinto street rods (known regionally as Rice Rods Rice Burners Rice Rockets etc) The assignment was to find out what these young men were saying about the Cobalt as they were deemed by Chevrolet to be influencers to non-Chevrolet owners

Campbell-Ewaldrsquos monitoring was confined geographically to the Great Lakes states During the course of the monitoring Macemore noticed that some of the Chicago and Ohio conversations in forums were referring to the Cobalt as a ldquoBaltrdquo Linabury noticed that conversations on the West side of Michigan referred to it as a ldquoC-Carrdquo or ldquoC-Baltrdquo C-Car was the internal name of the vehicle used by engineers but in Michigan (where the car is produced) it is possible that engineering names are known externally

Macemore then theorized that these terms were surfac-ing enough that they should be added to the keywords the monitoring tool was using to spider conversations After adding the new terms the number of conversations found by the tool increased by 53 This led to speculation that theinfluentialmembersofasocialcirclemaybemorelikelytohaveinternalnicknamesthanthoseoutsidethatcircle and that these names needed to be identified at the outset of any social media monitoring assignment to en-sure accurate monitoring and the largest possible data set

Result By adding the additional terms that were manually identified the conversational data set increased by more than 50 and the client gained insight and learnings into how their vehicles were referred to by the most influential purchasers of their product

12

Case Study OnStartradeOnStartrade is a multimillion dollar company that produces a telematics system for vehicles As the system is responsible for saving the lives of hundreds of people involved in motor vehicle accidents OnStartradersquos corporate marketing team wanted up to the minute reports on what their subscribers were saying their detractors and the media In 2007 OnStartrade hired Campbell-Ewaldrsquos Social Media Team to monitor conversations and report back with weekly findings and daily with any outstanding conversations or topics

Campbell-Ewaldrsquos Social Media Team quickly discovered there would be a few barriers to accurate monitoring For example people discussing certain television shows were appearing in the feed Sentences like ldquoDid you see what happened onStar Search last nightrdquo or ldquoThere was one episode onStar Trek wherehelliprdquo These false positives were quickly weeded out through exclusionary phrases added to the keyword set

The team also discovered linguistic variants of OnStartrade appearing in the conversations of loyal fans and influencers which included several hackers Some hackers were tweaking OnStartrade at home (similar to the jail-breaking of iPhones) for fun We found that they used numerous variants of OnStartrade including On On Star On_Star NOnStar ONStar OffStar On-Star OnsStar and BlondeStar (in reference to a YouTube parody of OnStartrade)

Result By adding the additional terms that were manually identified the conversational data set increased by more than 109 and the client gained insight and learnings into how OnStar was being referred to by the most influential purchasers of their product and by an unexpected fan base hackers

13

Technical SpecsAssigning new keywords to any social monitoring tool is simple Finding the keywords is the challenge The following demonstration shows how to add new keywords to an existing set using the popular social media monitoring tool Radian6

In this example the new Dell Mini 3 cellphone has been chosen as a topic to monitor Narrowing the feed to cell phones and removing ldquonoiserdquo about Dell laptops makes the results more accurate

By adding the keyword lsquocellphonersquo and the exclusionary keyword lsquolaptoprsquo the feed examples are more targeted

A search on Google Insights for lsquoDell Mini 3rsquo shows us that consumers are also searching for it as a lsquocellular dellrsquo lsquodell androidrsquo lsquodell android phonersquo lsquodell smartphonersquo and lsquodell mini 5rsquo (a different model)

A look at the Urban Dictionary indicates any cellphone may be referred to as a ldquocellierdquo by youth

These additional keywords (except perhaps the Mini 5) should be added to Radian6rsquos keywords as they represent the intent of users That these keywords are listed by Google as ldquoBreakoutsrdquo is significant breakouts represent a recent increase in search volume of more than 5000

Radian6

Radian6

Google Insights

Urban Dictionary

14

SummaryCampbell-Ewald has been an active participant in social media since early 2006 Their lead social media researchers Dave Linabury and Jason Macemore were among the first to develop social media monitoring software tools It was through the development of these early tools that were created to meet their own needs as researchers that led to understanding the linguistic challenges raised in this paper

Campbell-Ewaldrsquos Social Media team addressed these linguistic challenges in their own client monitoring projects over the past five years utilizing the following approaches

bull Determining current search trends around a topic

bull Determining the age and gender of the writer

bull Identifying the Influencers and recording their linguistic patterns

bull Identifying emoticons and comparing them to known regional and generational variants

It is critical in monitoring to understand that internal marketing descriptors and paid search terms are not enough to effectively crawl all of the conversations taking place around a brand Nor is it enough to rely on basic tools like Google Alerts Accurate monitoring is done with professional grade tools like Radian6 under the guidance of experienced monitoring teams like those at Campbell-Ewald

The monitor must use the Urban Dictionary to determine any industry or brand slang check Google AdWords for misspellings and current search trends and check Google Insights for regional interest Finally the researcher must either directly contact influential fans of the brand or failing that spend time reading blog posts by influencers and responses to their content from their audience Only then can a keyword set be considered accurate and comprehensive

15

ContactDaveLinabury Group Director Social MediaDaveLinaburyc-ecom

JasonMacemore Digital StrategistJasonMacemorec-ecom

GaryOlson Senior Social Media PlannerGaryOlsonc-ecom

Campbell-Ewald30400 Van Dyke Ave Warren Michigan 48093 +1 (586) 574-3400 httpc-ecom

Page 12: Linguistic challenges associated with monitoring social media

Case Study Chevrolet CobaltThe phenomenon of linguistic variants was first noticed and described by Linabury and Macemore in 2007 to General Motors while they were monitoring conversations pertaining to the Chevrolet Cobaltmdasha small car that young males were customizingmdashalong with Honda Accordsmdashinto street rods (known regionally as Rice Rods Rice Burners Rice Rockets etc) The assignment was to find out what these young men were saying about the Cobalt as they were deemed by Chevrolet to be influencers to non-Chevrolet owners

Campbell-Ewaldrsquos monitoring was confined geographically to the Great Lakes states During the course of the monitoring Macemore noticed that some of the Chicago and Ohio conversations in forums were referring to the Cobalt as a ldquoBaltrdquo Linabury noticed that conversations on the West side of Michigan referred to it as a ldquoC-Carrdquo or ldquoC-Baltrdquo C-Car was the internal name of the vehicle used by engineers but in Michigan (where the car is produced) it is possible that engineering names are known externally

Macemore then theorized that these terms were surfac-ing enough that they should be added to the keywords the monitoring tool was using to spider conversations After adding the new terms the number of conversations found by the tool increased by 53 This led to speculation that theinfluentialmembersofasocialcirclemaybemorelikelytohaveinternalnicknamesthanthoseoutsidethatcircle and that these names needed to be identified at the outset of any social media monitoring assignment to en-sure accurate monitoring and the largest possible data set

Result By adding the additional terms that were manually identified the conversational data set increased by more than 50 and the client gained insight and learnings into how their vehicles were referred to by the most influential purchasers of their product

12

Case Study OnStartradeOnStartrade is a multimillion dollar company that produces a telematics system for vehicles As the system is responsible for saving the lives of hundreds of people involved in motor vehicle accidents OnStartradersquos corporate marketing team wanted up to the minute reports on what their subscribers were saying their detractors and the media In 2007 OnStartrade hired Campbell-Ewaldrsquos Social Media Team to monitor conversations and report back with weekly findings and daily with any outstanding conversations or topics

Campbell-Ewaldrsquos Social Media Team quickly discovered there would be a few barriers to accurate monitoring For example people discussing certain television shows were appearing in the feed Sentences like ldquoDid you see what happened onStar Search last nightrdquo or ldquoThere was one episode onStar Trek wherehelliprdquo These false positives were quickly weeded out through exclusionary phrases added to the keyword set

The team also discovered linguistic variants of OnStartrade appearing in the conversations of loyal fans and influencers which included several hackers Some hackers were tweaking OnStartrade at home (similar to the jail-breaking of iPhones) for fun We found that they used numerous variants of OnStartrade including On On Star On_Star NOnStar ONStar OffStar On-Star OnsStar and BlondeStar (in reference to a YouTube parody of OnStartrade)

Result By adding the additional terms that were manually identified the conversational data set increased by more than 109 and the client gained insight and learnings into how OnStar was being referred to by the most influential purchasers of their product and by an unexpected fan base hackers

13

Technical SpecsAssigning new keywords to any social monitoring tool is simple Finding the keywords is the challenge The following demonstration shows how to add new keywords to an existing set using the popular social media monitoring tool Radian6

In this example the new Dell Mini 3 cellphone has been chosen as a topic to monitor Narrowing the feed to cell phones and removing ldquonoiserdquo about Dell laptops makes the results more accurate

By adding the keyword lsquocellphonersquo and the exclusionary keyword lsquolaptoprsquo the feed examples are more targeted

A search on Google Insights for lsquoDell Mini 3rsquo shows us that consumers are also searching for it as a lsquocellular dellrsquo lsquodell androidrsquo lsquodell android phonersquo lsquodell smartphonersquo and lsquodell mini 5rsquo (a different model)

A look at the Urban Dictionary indicates any cellphone may be referred to as a ldquocellierdquo by youth

These additional keywords (except perhaps the Mini 5) should be added to Radian6rsquos keywords as they represent the intent of users That these keywords are listed by Google as ldquoBreakoutsrdquo is significant breakouts represent a recent increase in search volume of more than 5000

Radian6

Radian6

Google Insights

Urban Dictionary

14

SummaryCampbell-Ewald has been an active participant in social media since early 2006 Their lead social media researchers Dave Linabury and Jason Macemore were among the first to develop social media monitoring software tools It was through the development of these early tools that were created to meet their own needs as researchers that led to understanding the linguistic challenges raised in this paper

Campbell-Ewaldrsquos Social Media team addressed these linguistic challenges in their own client monitoring projects over the past five years utilizing the following approaches

bull Determining current search trends around a topic

bull Determining the age and gender of the writer

bull Identifying the Influencers and recording their linguistic patterns

bull Identifying emoticons and comparing them to known regional and generational variants

It is critical in monitoring to understand that internal marketing descriptors and paid search terms are not enough to effectively crawl all of the conversations taking place around a brand Nor is it enough to rely on basic tools like Google Alerts Accurate monitoring is done with professional grade tools like Radian6 under the guidance of experienced monitoring teams like those at Campbell-Ewald

The monitor must use the Urban Dictionary to determine any industry or brand slang check Google AdWords for misspellings and current search trends and check Google Insights for regional interest Finally the researcher must either directly contact influential fans of the brand or failing that spend time reading blog posts by influencers and responses to their content from their audience Only then can a keyword set be considered accurate and comprehensive

15

ContactDaveLinabury Group Director Social MediaDaveLinaburyc-ecom

JasonMacemore Digital StrategistJasonMacemorec-ecom

GaryOlson Senior Social Media PlannerGaryOlsonc-ecom

Campbell-Ewald30400 Van Dyke Ave Warren Michigan 48093 +1 (586) 574-3400 httpc-ecom

Page 13: Linguistic challenges associated with monitoring social media

Case Study OnStartradeOnStartrade is a multimillion dollar company that produces a telematics system for vehicles As the system is responsible for saving the lives of hundreds of people involved in motor vehicle accidents OnStartradersquos corporate marketing team wanted up to the minute reports on what their subscribers were saying their detractors and the media In 2007 OnStartrade hired Campbell-Ewaldrsquos Social Media Team to monitor conversations and report back with weekly findings and daily with any outstanding conversations or topics

Campbell-Ewaldrsquos Social Media Team quickly discovered there would be a few barriers to accurate monitoring For example people discussing certain television shows were appearing in the feed Sentences like ldquoDid you see what happened onStar Search last nightrdquo or ldquoThere was one episode onStar Trek wherehelliprdquo These false positives were quickly weeded out through exclusionary phrases added to the keyword set

The team also discovered linguistic variants of OnStartrade appearing in the conversations of loyal fans and influencers which included several hackers Some hackers were tweaking OnStartrade at home (similar to the jail-breaking of iPhones) for fun We found that they used numerous variants of OnStartrade including On On Star On_Star NOnStar ONStar OffStar On-Star OnsStar and BlondeStar (in reference to a YouTube parody of OnStartrade)

Result By adding the additional terms that were manually identified the conversational data set increased by more than 109 and the client gained insight and learnings into how OnStar was being referred to by the most influential purchasers of their product and by an unexpected fan base hackers

13

Technical SpecsAssigning new keywords to any social monitoring tool is simple Finding the keywords is the challenge The following demonstration shows how to add new keywords to an existing set using the popular social media monitoring tool Radian6

In this example the new Dell Mini 3 cellphone has been chosen as a topic to monitor Narrowing the feed to cell phones and removing ldquonoiserdquo about Dell laptops makes the results more accurate

By adding the keyword lsquocellphonersquo and the exclusionary keyword lsquolaptoprsquo the feed examples are more targeted

A search on Google Insights for lsquoDell Mini 3rsquo shows us that consumers are also searching for it as a lsquocellular dellrsquo lsquodell androidrsquo lsquodell android phonersquo lsquodell smartphonersquo and lsquodell mini 5rsquo (a different model)

A look at the Urban Dictionary indicates any cellphone may be referred to as a ldquocellierdquo by youth

These additional keywords (except perhaps the Mini 5) should be added to Radian6rsquos keywords as they represent the intent of users That these keywords are listed by Google as ldquoBreakoutsrdquo is significant breakouts represent a recent increase in search volume of more than 5000

Radian6

Radian6

Google Insights

Urban Dictionary

14

SummaryCampbell-Ewald has been an active participant in social media since early 2006 Their lead social media researchers Dave Linabury and Jason Macemore were among the first to develop social media monitoring software tools It was through the development of these early tools that were created to meet their own needs as researchers that led to understanding the linguistic challenges raised in this paper

Campbell-Ewaldrsquos Social Media team addressed these linguistic challenges in their own client monitoring projects over the past five years utilizing the following approaches

bull Determining current search trends around a topic

bull Determining the age and gender of the writer

bull Identifying the Influencers and recording their linguistic patterns

bull Identifying emoticons and comparing them to known regional and generational variants

It is critical in monitoring to understand that internal marketing descriptors and paid search terms are not enough to effectively crawl all of the conversations taking place around a brand Nor is it enough to rely on basic tools like Google Alerts Accurate monitoring is done with professional grade tools like Radian6 under the guidance of experienced monitoring teams like those at Campbell-Ewald

The monitor must use the Urban Dictionary to determine any industry or brand slang check Google AdWords for misspellings and current search trends and check Google Insights for regional interest Finally the researcher must either directly contact influential fans of the brand or failing that spend time reading blog posts by influencers and responses to their content from their audience Only then can a keyword set be considered accurate and comprehensive

15

ContactDaveLinabury Group Director Social MediaDaveLinaburyc-ecom

JasonMacemore Digital StrategistJasonMacemorec-ecom

GaryOlson Senior Social Media PlannerGaryOlsonc-ecom

Campbell-Ewald30400 Van Dyke Ave Warren Michigan 48093 +1 (586) 574-3400 httpc-ecom

Page 14: Linguistic challenges associated with monitoring social media

Technical SpecsAssigning new keywords to any social monitoring tool is simple Finding the keywords is the challenge The following demonstration shows how to add new keywords to an existing set using the popular social media monitoring tool Radian6

In this example the new Dell Mini 3 cellphone has been chosen as a topic to monitor Narrowing the feed to cell phones and removing ldquonoiserdquo about Dell laptops makes the results more accurate

By adding the keyword lsquocellphonersquo and the exclusionary keyword lsquolaptoprsquo the feed examples are more targeted

A search on Google Insights for lsquoDell Mini 3rsquo shows us that consumers are also searching for it as a lsquocellular dellrsquo lsquodell androidrsquo lsquodell android phonersquo lsquodell smartphonersquo and lsquodell mini 5rsquo (a different model)

A look at the Urban Dictionary indicates any cellphone may be referred to as a ldquocellierdquo by youth

These additional keywords (except perhaps the Mini 5) should be added to Radian6rsquos keywords as they represent the intent of users That these keywords are listed by Google as ldquoBreakoutsrdquo is significant breakouts represent a recent increase in search volume of more than 5000

Radian6

Radian6

Google Insights

Urban Dictionary

14

SummaryCampbell-Ewald has been an active participant in social media since early 2006 Their lead social media researchers Dave Linabury and Jason Macemore were among the first to develop social media monitoring software tools It was through the development of these early tools that were created to meet their own needs as researchers that led to understanding the linguistic challenges raised in this paper

Campbell-Ewaldrsquos Social Media team addressed these linguistic challenges in their own client monitoring projects over the past five years utilizing the following approaches

bull Determining current search trends around a topic

bull Determining the age and gender of the writer

bull Identifying the Influencers and recording their linguistic patterns

bull Identifying emoticons and comparing them to known regional and generational variants

It is critical in monitoring to understand that internal marketing descriptors and paid search terms are not enough to effectively crawl all of the conversations taking place around a brand Nor is it enough to rely on basic tools like Google Alerts Accurate monitoring is done with professional grade tools like Radian6 under the guidance of experienced monitoring teams like those at Campbell-Ewald

The monitor must use the Urban Dictionary to determine any industry or brand slang check Google AdWords for misspellings and current search trends and check Google Insights for regional interest Finally the researcher must either directly contact influential fans of the brand or failing that spend time reading blog posts by influencers and responses to their content from their audience Only then can a keyword set be considered accurate and comprehensive

15

ContactDaveLinabury Group Director Social MediaDaveLinaburyc-ecom

JasonMacemore Digital StrategistJasonMacemorec-ecom

GaryOlson Senior Social Media PlannerGaryOlsonc-ecom

Campbell-Ewald30400 Van Dyke Ave Warren Michigan 48093 +1 (586) 574-3400 httpc-ecom

Page 15: Linguistic challenges associated with monitoring social media

SummaryCampbell-Ewald has been an active participant in social media since early 2006 Their lead social media researchers Dave Linabury and Jason Macemore were among the first to develop social media monitoring software tools It was through the development of these early tools that were created to meet their own needs as researchers that led to understanding the linguistic challenges raised in this paper

Campbell-Ewaldrsquos Social Media team addressed these linguistic challenges in their own client monitoring projects over the past five years utilizing the following approaches

bull Determining current search trends around a topic

bull Determining the age and gender of the writer

bull Identifying the Influencers and recording their linguistic patterns

bull Identifying emoticons and comparing them to known regional and generational variants

It is critical in monitoring to understand that internal marketing descriptors and paid search terms are not enough to effectively crawl all of the conversations taking place around a brand Nor is it enough to rely on basic tools like Google Alerts Accurate monitoring is done with professional grade tools like Radian6 under the guidance of experienced monitoring teams like those at Campbell-Ewald

The monitor must use the Urban Dictionary to determine any industry or brand slang check Google AdWords for misspellings and current search trends and check Google Insights for regional interest Finally the researcher must either directly contact influential fans of the brand or failing that spend time reading blog posts by influencers and responses to their content from their audience Only then can a keyword set be considered accurate and comprehensive

15

ContactDaveLinabury Group Director Social MediaDaveLinaburyc-ecom

JasonMacemore Digital StrategistJasonMacemorec-ecom

GaryOlson Senior Social Media PlannerGaryOlsonc-ecom

Campbell-Ewald30400 Van Dyke Ave Warren Michigan 48093 +1 (586) 574-3400 httpc-ecom

Page 16: Linguistic challenges associated with monitoring social media

ContactDaveLinabury Group Director Social MediaDaveLinaburyc-ecom

JasonMacemore Digital StrategistJasonMacemorec-ecom

GaryOlson Senior Social Media PlannerGaryOlsonc-ecom

Campbell-Ewald30400 Van Dyke Ave Warren Michigan 48093 +1 (586) 574-3400 httpc-ecom