Why We Twitter: Understanding Microblogging Usage and Communities

Embed Size (px)

Citation preview

  • 8/14/2019 Why We Twitter: Understanding Microblogging Usage and Communities

    1/10

    Why We Twitter: Understanding MicrobloggingUsage and Communities

    Akshay JavaUniversity of Maryland Baltimore County

    1000 Hilltop CircleBaltimore, MD 21250, USA

    [email protected]

    Xiaodan SongNEC Laboratories America

    10080 N. Wolfe Road, SW3-350Cupertino, CA 95014, USA

    [email protected] Finin

    University of Maryland Baltimore County1000 Hilltop Circle

    Baltimore, MD 21250, [email protected]

    Belle TsengNEC Laboratories America

    10080 N. Wolfe Road, SW3-350Cupertino, CA 95014, [email protected]

    ABSTRACTMicroblogging is a new form of communication in whichusers can describe their current status in short posts dis-tributed by instant messages, mobile phones, email or theWeb. Twitter, a popular microblogging tool has seen a lotof growth since it launched in October, 2006. In this paper,we present our observations of the microblogging phenom-ena by studying the topological and geographical propertiesof Twitters social network. We nd that people use mi-croblogging to talk about their daily activities and to seekor share information. Finally, we analyze the user intentionsassociated at a community level and show how users withsimilar intentions connect with each other.

    Categories and Subject DescriptorsH.3.3 [Information Search and Retrieval ]: InformationSearch and Retrieval - Information Filtering; J.4 [ ComputerApplications ]: Social and Behavioral Sciences - Economics

    General TermsSocial Network Analysis, User Intent, Microblogging, SocialMedia

    1. INTRODUCTIONMicroblogging is a relatively new phenomenon dened as a form of blogging that lets you write brief text updates (usu-ally less than 200 characters) about your life on the go and send them to friends and interested observers via text mes-saging, instant messaging (IM), email or the web. 1 . It is

    1

    http://en.wikipedia.org/wiki/Micro-blogging

    Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for prot or commercial advantage and that copiesbear this notice and the full citation on the rst page. To copy otherwise, orrepublish, to post on servers or to redistribute to lists, requires prior specicpermission and/or a fee.Joint 9th WEBKDD and 1st SNA-KDD Workshop 07 , August 12, 2007 ,San Jose, California , USA . Copyright 2007 ACM 1-59593-444-8... $5.00.

    provided by several services including Twitter 2 , Jaiku 3 andmore recently Pownce 4 . These tools provide a light-weight,easy form of communication that enables users to broadcastand share information about their activities, opinions andstatus. One of the popular microblogging platforms is Twit-ter [29]. According to ComScore, within eight months of itslaunch, Twitter had about 94,000 users as of April, 2007 [9].Figure 1 shows a snapshot of the rst authors Twitter home-page. Updates or posts are made by succinctly describingones current status within a limit of 140 characters. Top-ics range from daily life to current events, news stories, andother interests. IM tools including Gtalk, Yahoo and MSNhave features that allow users to share their current statuswith friends on their buddy lists. Microblogging tools facili-tate easily sharing status messages either publicly or withina social network.

    Figure 1: An example Twitter homepage with up-dates talking about daily experiences and personalinterests.2 http://www.twitter.com3 http://www.jaiku.com4 http://www.pownce.com

  • 8/14/2019 Why We Twitter: Understanding Microblogging Usage and Communities

    2/10

    Compared to regular blogging, microblogging fullls a needfor an even faster mode of communication. By encourag-ing shorter posts, it lowers users requirement of time andthought investment for content generation. This is also oneof its main differentiating factors from blogging in general.The second important difference is the frequency of update.On average, a prolic bloger may update her blog once ev-ery few days; on the other hand a microblogger may postseveral updates in a single day.

    With the recent popularity of Twitter and similar microblog-ging systems, it is important to understand why and how people use these tools. Understanding this will help usevolve the microblogging idea and improve both microblog-ging client and infrastructure software. We tackle this prob-lem by studying the microblogging phenomena and analyz-ing different types of user intentions in such systems.

    Much of research in user intention detection has focused onunderstanding the intent of a search queries. According toBroder [5], the three main categories of search queries arenavigational, informational and transactional. Understand-ing the intention for a search query is very different from

    user intention for content creation. In a survey of bloggers,Nardi et al. [26] describe different motivations for whywe blog. Their ndings indicate that blogs are used as atool to share daily experiences, opinions and commentary.Based on their interviews, they also describe how bloggersform communities online that may support different socialgroups in real world. Lento et al. [21] examined the im-portance of social relationship in determining if users wouldremain active in a blogging tool called Wallop. A users re-tention and interest in blogging could be predicted by thecomments received and continued relationship with otheractive members of the community. Users who are invited bypeople with whom they share pre-exiting social relationshipstend to stay longer and active in the network. Moreover, cer-tain communities were found to have a greater retention ratedue to existence of such relationships. Mutual awareness ina social network has been found effective in discovering com-munities [23].

    In computational linguists, researchers have studied the prob-lem of recognizing the communicative intentions that un-derlie utterances in dialog systems and spoken language in-terfaces. The foundations of this work go back to Austin[2], Stawson [32] and Grice [14]. Grosz [15] and Allen [1]carried out classic studies in analyzing the dialogues be-tween people and between people and computers in coopera-tive task oriented environments. More recently, Matsubara[24] has applied intention recognition to improve the per-formance of automobile-based spoken dialog system. Whiletheir work focusses on the analysis of ongoing dialogs be-tween two agents in a fairly well dened domain, studyinguser intention in Web-based systems requires looking at boththe content and link structure.

    In this paper, we describe how users have adopted a spe-cic microblogging platform, Twitter. Microblogging is rel-atively nascent, and to the best of our knowledge, no largescale studies have been done on this form of communicationand information sharing. We study the topological and geo-graphical structure of Twitters social network and attempt

    to understand the user intentions and community structurein microblogging. From our analysis, we nd that the maintypes of user intentions are: daily chatter, conversations,sharing information and reporting news. Furthermore, usersplay different roles of information source, friends or informa-tion seeker in different communities.

    The paper is organized as follows: in Section 2, we describethe dataset and some of the properties of the underlyingsocial network of Twitter users. Section 3 provides an anal-ysis of Twitters social network and its spread across geogra-phies. Next, in Section 4 we describe aggregate user behav-ior and community level user intentions. Section 5 providesa taxonomy of user intentions. Finally, we summarize ourndings and conclude with Section 6.

    2. DATASET DESCRIPTIONTwitter is currently one of the most popular microbloggingplatforms. Users interact with this system by either using aWeb interface, IM agent or sending SMS updates. Membersmay choose to make their updates public or available only tofriends. If users prole is made public, her updates appearin a public timeline of recent updates. The dataset usedin this study was created by monitoring this public timelinefor a period of two months starting from April 01, 2007 toMay 30, 2007. A set of recent updates were fetched onceevery 30 seconds. There are a total of 1,348,543 posts from76,177 distinct users in this collection.

    Twitter allows a user, A, to follow updates from othermembers who are added as friends. An individual who isnot a friend of user A but follows her updates is known asa follower. Thus friendships can either be reciprocated orone-way. By using the Twitter developer API 5 , we fetchedthe social network of all users. We construct a directedgraph G (V, E ), where V represents a set of users and E represents the set of friend relations. A directed edge e

    exists between two users u and v if user u declares v asa friend. There are a total of 87,897 distinct nodes with829,053 friend relation between them. There are more nodesin this graph due to the fact that some users discoveredthough the link structure do not have any posts during theduration in which the data was collected. For each user, wealso obtained their prole information and mapped theirlocation to a geographic coordinate, details of which areprovided in the following section.

    3. MICROBLOGGING IN TWITTERThis section describes some of the characteristic propertiesof Twitters Social Network including its network topologyand geographical distribution.

    3.1 Growth of TwitterSince Twitter provides a sequential user and post identier,we can estimate the growth rate of Twitter. Figure 2 showsthe growth rate for users and Figure 3 shows the growth ratefor posts in this collection. Since, we do not have access tohistorical data, we can only observe its growth for a twomonth time period. For each day we identify the maximumvalue for the user identier and post identier as provided5 http://twitter.com/help/api

  • 8/14/2019 Why We Twitter: Understanding Microblogging Usage and Communities

    3/10

    3000000

    3500000

    4000000

    4500000

    5000000

    5500000

    6000000

    6500000

    11-May5-May29-Apr21-Apr14-Apr7-Apr1-Apr

    M a x

    U s e r I

    D

    April - May 2007

    Twitter Growth Rate (Users)

    Growth of Users

    Figure 2: Twitter User Growth Rate. Figure showsthe maximum userid observed for each day in thedataset. After an initial period of interest aroundMarch 2007, the rate at which new users are joiningTwitter has slowed.

    by the Twitter API. By observing the change in these val-ues, we can roughly estimate the growth of Twitter. It isinteresting to note that even though Twitter launched in2006, it really became popular soon after it won the Southby SouthWest (SXSW) conference Web Awards 6 in March,2007. Figure 2 shows the initial growth in users as a resultof interest and publicity that Twitter generated at this con-ference. After this period, the rate at which new users are joining the network has slowed. Despite the slow down, thenumber of new posts is constantly growing, approximatelydoubling every month indicating a steady base of users gen-erating content.

    Following Kolari et al. [18], we use the following denitionof user activity and retention:

    Denition A user is considered active during a week if heor she has posted at least one post during that week.

    Denition An active user is considered retained for thegiven week, if he or she reposts at least once in the following X weeks.

    Due to the short time period for which the data is availableand the nature of Microblogging we decided to use X as aperiod of one week. Figure 4 shows the user activity andretention for the duration of the data. About half of theusers are active and of these half of them repost in the fol-lowing week. There is a lower activity recorded during thelast week of the data due to the fact that updates from thepublic timeline are not available for two days during thisperiod.

    3.2 Network PropertiesThe Web, blogosphere, online social networks and humancontact networks all belong to a class of scale-free net-works [3] and exhibit a small world phenomenon [33]. It6 http://2007.sxsw.com/

    15000000

    20000000

    25000000

    30000000

    35000000

    40000000

    45000000

    50000000

    55000000

    60000000

    65000000

    70000000

    11-May5-May29-Apr21-Apr14-Apr7-Apr1-Apr

    M a x

    P o s

    t I D

    April - May 2007

    Twitter Growth Rate (Posts)

    Growth of Posts

    Figure 3: Twitter Posts Growth Rate. Figure showsthe maximum post ID observed for each day in thedataset. Although the rate at which new users are joining the network has slowed, the number of postsare increasing at a steady rate.

    has been shown that many properties including the degreedistributions on the Web follow a power law distribution[19, 6]. Recent studies have conrmed that some of theseproperties also hold true for the blogosphere [31].

    Property Twitter WWETotal Nodes 87897 143,736Total Links 829247 707,761Average Degree 18.86 4.924Indegree Slope -2.4 -2.38Outdegree Slope -2.4 NADegree correlation 0.59 NADiameter 6 12

    Largest WCC size 81769 107,916Largest SCC size 42900 13,393Clustering Coefficient 0.106 0.0632Reciprocity 0.58 0.0329

    Table 1: Twitter Social Network Statistics

    Table 1 describes some of the properties for Twitters socialnetwork. We also compare these properties with the corre-sponding values for the Weblogging Ecosystems Workshop(WWE) collection [4] as reported by Shi et al. [31]. Theirstudy shows a network with high degree correlation (alsoshown in Figure 6) and high reciprocity. This implies thatthere are a large number of mutual acquaintances in the

    graph. New Twitter users often initially join the networkon invitation from friends. Further, new friends are addedto the network by browsing through user proles and addingother known acquaintances. High reciprocal links has alsobeen observed in other online social networks like Livejour-nal [22]. Personal communication and contact network suchas cell phone call graphs [25] also have high degree corre-lation. Figure 5 shows the cumulative degree distributions[27, 8] of Twitters network. It is interesting to note thatthe slopes in and out are both approximately -2.4. Thisvalue for the power law exponent is similar to that found for

  • 8/14/2019 Why We Twitter: Understanding Microblogging Usage and Communities

    4/10

    User Activity and Retention

    0

    10000

    20000

    30000

    40000

    50000

    60000

    1 2 3 4 5 6 7 8

    Week

    N u m

    b e r o

    f U s e r s

    Retained Users Active Users

    Figure 4: Twitter User Activity and Retention

    Continent Number of UsersNorth America 21064Europe 7442Asia 6753Oceania 910South America 816Africa 120Others 78Unknown 38994

    Table 2: Table shows the geographical distributionof Twitter users. North America, Europe and Asiahave the highest adoption of Twitter.

    the Web (typically -2.1 for indegree [11]) and blogosphere(-2.38 for the WWE collection).

    3.3 Geographical DistributionTwitter provides limited prole information such as name,bio, timezone and location. For the 76K users in our collec-tion about 39K had specied locations that could be parsedcorrectly and resolved to their respective latitude and longi-tudinal coordinates (using Yahoo! Geocoding API 7 ). Fig-ure 7 and Table 2 shows the geographical distribution of Twitter users and the number of users in each continent.Twitter is most popular in US, Europe and Asia (mainlyJapan). Tokyo, New York and San Francisco are the majorcities where user adoption of Twitter is high [16].

    Twitters popularity is global and the social network of itsusers crosses continental boundaries. By mapping each userslatitude and longitude to a continent location we can extractthe origin and destination location for every edge. Table 3shows the distribution of friendship relations across majorcontinents represented in the dataset. Oceana is used torepresent Australia, New Zealand and other island nations.A signicant portion (about 45%) of the Social Network stilllies within North America. Moreover, there are more intra-7 http://developer.yahoo.com/maps/

    100

    101

    102

    103

    104

    105

    104

    103

    102

    101

    100

    101

    Indegree Distribution of Twitter Social Network

    Indegree K

    P ( x

    K )

    Indegree in = 2.412

    (a) Indegree Distribution

    100

    101

    102

    103

    104

    105

    104

    103

    102

    101

    10 0

    101

    102

    Outdegree Distribution of Twitter Social Network

    Outdegree K

    P ( x

    K )

    Outdegree out = 2.412

    (b) Outdegree Distribution

    Figure 5: Twitter social network has a power lawexponent of about -2.4 which is similar to the Weband blogosphere.

    continent links than across continents. This is consistentwith observations that the probability of friendship betweentwo users is inversely proportionate to their geographic prox-imity [22].

    In Table 4, we compare some of the network propertiesacross these three continents with most users: North Amer-

    ica, Europe and Asia. For each continent the social networkis extracted by considering only the subgraph where both thesource and destination of the friendship relation belong tothe same continent. Asian and European communities havea higher degree correlation and reciprocity than their NorthAmerican counterparts. Language plays an important roleis such social networks. Many users from Japan and Span-ish speaking world connect with others who speak the samelanguage. In general, users in Europe and Asia tend to havehigher reciprocity and clustering coefficient values in theircorresponding subgraphs.

  • 8/14/2019 Why We Twitter: Understanding Microblogging Usage and Communities

    5/10

  • 8/14/2019 Why We Twitter: Understanding Microblogging Usage and Communities

    6/10

    and maximize ow/minimize cut to detect communities. Insocial network area, Newman and Girvan [13, 7] proposed ametric called modularity to measure the strength of the com-munity structure. The intuition is that a good division of anetwork into communities is not merely to make the num-ber of edges running between communities small; rather, thenumber of edges between groups is smaller than expected.Only if the number of between group edges is signicantlylower than what would be expected purely by chance can we justiably claim to have found signicant community struc-ture. Based on the modularity measure of the network, op-timization algorithms are proposed to nd good divisions of a network into communities by optimizing the modularityover possible divisions. Also, this optimization process canbe related to the eigenvectors of matrices. However, in theabove algorithms, each node has to belong to one commu-nity, while in real networks, communities often overlap. Oneperson can serve a totally different functionality in differentcommunities. In an extreme case, one user can serve as theinformation source in one community and the informationseeker in another community.

    People in friendship communities often know each other.

    Prompted by this intuition, we applied the Clique Perco-lation Method (CPM) [28, 10] to nd overlapping commu-nities in networks. The CPM is based on the observationthat a typical member in a community is linked to manyother members, but not necessarily to all other nodes in thesame community. In CPM, the k-clique-communities areidentied by looking for the unions of all k-cliques that canbe reached from each other through a series of adjacent k-cliques, where two k-cliques are said to be adjacent if theyshare k-1 nodes. This algorithm is suitable for detecting thedense communities in the network.

    Here we list a few specic examples of how communities formin Twitter and why users consist of these communities - whatuser intentions are in each community. Figure 8 illustrates arepresentative community with 58 users closely communicat-ing with each other through Twitter service. The key termsthey talk about include work, Xbox, game, and play. Itlooks like some users with gaming interests getting togetherto discuss the information about certain new products onthis topic or sharing gaming experience. When we go tospecic users website, we also nd this type of conversation:BDazzler@Steve519 I dont know about the Jap PS3s. I think they have region encoding, so youd only be able to play Jap games. Euro has no ps2 chip or BobbyBlackwolf Play-ing with the PS3 rmware update, cant get WMP11 to shareMP4s and the PS3 wont play WMVs or AVIs...Fail. Wealso noticed that users in this community also share witheach other their personal feeling and daily life experiencesin addition to comments on gaming. Based on our studyof the communities in Twitter dataset, we observed that thisis a representative community in Twitter network: peoplein one community have certain common interests and theyalso share with each other about their personal feeling anddaily experience.

    Using CPM, we are able to nd how communities connectedto each other by overlapped components. Figure 9 illustratestwo communities with podcasting interests where GSPN andpcamarata are the ones who connected these two communi-

    ties. In GSPNs bio, he mentioned he is the Producer of theGenerally Speaking Podcast Network 8 ; while in pcamaratasbio, he mentioned he is a family man, a neurosurgeon, anda a podcaster. By looking at the top key terms of these twocommunities, we can see that the focus of the green com-munity is a little more diversied: people occasionally talkabout podcasting, while the topic of the red community isa little more focused. In a sense, the red community is likea professional community of podcasting while the green oneis a informal community about podcasting.

    Figure 10 illustrates ve communities connected by Scobleizer,who is a Tech geek blogger. People follow his posts toget technology news. People in different communities sharedifferent interests with Scobleizer. Specically, AndruEd-wards, Scobleizer, daryn, and davidgeller get together toshare video related news. CaptSolo et al. have some inter-ests on Semantic Web. AdoMatic et al. are engineers andhave interests with coding related issues.

    Studying intentions at a community level, we observe usersparticipate in communities which share similar interests. In-dividuals may have different intentions for joining these com-

    munities. While some act as information providers, oth-ers are merely looking for new and interesting information.Next, we analyze aggregate trends across users spread overmany communities, we can identify certain distinct themes.Often there are recurring patterns in word usages. Suchpatterns may be observed over a day or a week. For exam-ple Figure 11 shows the trends for the terms friends andschool in the entire corpus. While school is of interestduring weekdays, friends take over on the weekends. The

    Figure 11: Daily Trends for terms school andfriends. The term school is more frequent during

    the early week, friends take over during the week-end.

    log-likelihood ratio is used to determine terms that are of signicant importance for a given day of the week. Using atechnique described by Rayson and Garside [30], we createa contingency table of term frequencies for a given day andthe rest of the week.

    8 http://ravenscraft.org/gspn/home

  • 8/14/2019 Why We Twitter: Understanding Microblogging Usage and Communities

    7/10

    Key Terms just :273 com:225work:185 l ike:172good:168 going:157got:152 time:142live:136 new:133xbox:125 t inyurl:122today:121 game:115playing:115 twitter:109day:108 lol:10p lay:100 halo:100night:90 home:89getting:88 need:86think:85 gamerandy:85ll:85 360:84watching:79 want:78know:77

    Figure 8: An example of a gaming community who also share daily experiences.

    Day Rest of the Week TotalFreq of word a b a+bFreq of other words c-a d-b c+d-a-bTotal c d c+d

    Comparing the terms that occur on a given day with thehistogram of terms for the rest of the week, we nd the mostdescriptive terms. The log-likelihood score is calculated asfollows:

    LL = 2 (a log ( aE 1 ) + b log (b

    E 2 ))

    where E 1 = c a + bc + d and E 2 = d a + bc + d

    Figure 12 shows the most descriptive terms for each dayof the week. Some of the extracted terms correspond torecurring events and activities signicant for a particularday of the week for example school or party. Otherterms are related to current events like easter and EMI.

    5. DISCUSSIONFollowing section presents a brief taxonomy of user inten-tions on Twitter. The apparent intention of a Twitter postwas determined manually by the rst author. Each postwas read and categorized. Posts that were highly ambigu-ous or for which the author could not make a judgement wereplaced in the category UNKNOWN. Based on this analysiswe have found following are some of the main user intentionson Twitter:

    Daily Chatter Most posts on Twitter talk about dailyroutine or what people are currently doing. This is thelargest and most common user of Twitter

    Conversations In Twitter, since there is no direct wayfor people to comment or reply to their friends posts,early adopters started using the @ symbol followed bya username for replies. About one eighth of all postsin the collection contain a conversation and this formof communication was used by almost 21% of users inthe collection.

  • 8/14/2019 Why We Twitter: Understanding Microblogging Usage and Communities

    8/10

    Key Termsgoing:222 just:218work:170 night:143bed:140 time:139good:137 com:130lost:124 day:122home:112listening:111 today:100new:98 got:97gspn:92watching:92 kids:88morning:81 twitter:79getting:77 tinyurl:75lunch :74 l ike:72podcast:72 watch:71ready:70 tv:69need:64 live:61tonight:61 trying:58love:58 cl iff:58dinner:56

    Key Terms jus t:312 com:180work:180 t ime:149listening:147 home:145going:139 day:134got:126 today:124good:116 bed:114night:112 t inyurl:97getting:88 podcast:87dinner:85 watching:83like:78 mass:78lunc h:72 new:72ll:70 tomorrow:69ready:64 twitter:62working:61 tonight:61morning:58 need:58great:58 finished:55tv:54

    Figure 9: Example of how two communities connect to each other

    Sharing information/URLs About 13% of all the postsin the collection contain some URL in them. Due tothe small character limit a URL shortening service likeTinyURL 9 is frequently used to make this feature fea-sible.

    Reporting news Many users report latest news or com-ment about current events on Twitter. Some auto-mated users or agents post updates like weather re-ports and new stories from RSS feeds. This is an in-teresting application of Twitter that has evolved dueto easy access to the developer API.

    Using the link structure, following are the main categoriesof users on Twitter:

    Information Source An information source is also ahub and has a large number of followers. This usermay post updates on regular intervals or infrequently.

    9 http://www.tinyurl.com

    Despite infrequent updates, certain users have a largenumber of followers due to the valuable nature of theirupdates. Some of the information sources were alsofound to be automated tools posting news and otheruseful information on Twitter.

    Friends Most relationships fall into this broad cate-gory. There are many sub-categories of friendshipson Twitter. For example a user may have friends,family and co-workers on their friend or follower lists.Sometimes unfamiliar users may also add someone asa friend.

    Information Seeker An information seeker is a personwho might post rarely, but follows other users regu-larly.

    Our study has revealed different motivations and utilitiesof microblogging platforms. A single user may have multi-ple intentions or may even serve different roles in differentcommunities. For example, there may be posts meant to

  • 8/14/2019 Why We Twitter: Understanding Microblogging Usage and Communities

    9/10

    com:175 twitter:134 just:133 like:86good:82 t inyurl:75time:74 new:74

    jasona:73 going:68day:63 don:61work:58 think:56

    ll:54 scottw:54today:52 hkarthik:50nice:49 getting:47got:47 really:46yeah:44 need:43watching:41 love:41night:40 home:40

    com:93 twit ter:74 just:35 new:32tinyurl:29 going:24ll:22 blog:21

    jaiku:21 don:21leo:21 flickr:21

    like:19 video:18google:18 today:18feeds:18 getting:16yeah:16 good:15people:15

    com:93 twit ter: 76 tinyurl:34 just:32 new:28 video:26going:24 ll:22 jaiku:22blog:21 leo:21 like:19don:19 gamerandy:19 yeah:18google:17 live:16 people:16got:16 know:15 time:15

    com:121 twit ter:76 just:50ustream:43 tv:42 live:42today:39 hawaii:36 day:33new:33 time:33 good:33video:32 leo:30 work:30like:28 watching:28 t inyurl:28

    com:198 twi tter :132 just :109t inyurl:87 going:59 blog:56like:55 good:51 new:50url:50 day:49 people:46time:45 today:45 google:42don:41 think:40 night:38ll:38 need:35 got:33ireland:33 great :31 looking:29work:29 thanks:28 video:26

    Figure 10: Example Communities in Twitter Social Network. Key terms indicate that these communitiesare talking mostly about technology. The user Scobliezer connects multiple communities in the network.

    update your personal network on a holiday plan or a postto share an interesting link with co-workers. Multiple userintentions have led to some users feeling overwhelmed bymicroblogging services [20]. Based on our analysis of userintentions, we believe that the ability to categorize friendsinto groups (e.g. family, co-workers) would greatly benetthe adoption of microblogging platforms. In addition fea-tures that could help facilitate conversations and sharingnews would be benecial.

    6. CONCLUSIONIn this study we have analyzed a large social network in anew form of social media known as microblogging. Such net-works were found to have a high degree correlation and reci-procity, indicating close mutual acquaintances among users.While determining an individual users intention in usingsuch applications is challenging, by analyzing the aggregatebehavior across communities of users, we can describe thecommunity intention. Understanding these intentions andlearning how and why people use such tools can be helpful

    in improving them and adding new features that would re-tain more users. In this work, we have identied differenttypes of user intentions and studied the community struc-tures. Currently, we are working on automated approachesof detecting user intentions with related community struc-tures.

    7. ACKNOWLEDGEMENTSWe would like to thank Twitter Inc. for providing an API

    to their service and Pranam Kolari, Xiaolin Shi and AmitKarandikar for their suggestions.

    8. REFERENCES[1] J. Allen. Recognizing intentions from natural language

    utterances. Computational Models of Discourse , pages107166, 1983.

    [2] J. Austin. How to Do Things with Words . OxfordUniversity Press Oxford, 1976.

    [3] A.-L. Barabasi and R. Albert. Emergence of scaling inrandom networks. Science , 286:509, 1999.

  • 8/14/2019 Why We Twitter: Understanding Microblogging Usage and Communities

    10/10

    Figure 12: Distinctive terms for each day of the weekranked using Log-likelihood ratio.

    [4] Blogpulse. The 3rd annual workshop on webloggingecosystem: Aggregation, analysis and dynamics, 15thworld wide web conference, May 2006.

    [5] A. Broder. A taxonomy of web search. SIGIR Forum ,36(2):310, 2002.

    [6] A. Broder, R. Kumar, F. Maghoul, P. Raghavan,

    S. Rajagopalan, R. Stata, A. Tomkins, and J. Wiener.Graph structure in the web. In Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking , pages309320, Amsterdam, The Netherlands, TheNetherlands, 2000. North-Holland Publishing Co.

    [7] A. Clauset, M. E. J. Newman, and C. Moore. Findingcommunity structure in very large networks. Physical Review E , 70:066111, 2004.

    [8] A. Clauset, C. R. Shalizi, and M. E. J. Newman.Power-law distributions in empirical data, Jun 2007.

    [9] Comscore. http://www.usatoday.com/tech/webguide/2007-05-28-social-sites_N.htm .

    [10] I. Derenyi, G. Palla, and T. Vicsek. Clique percolationin random networks. Physical Review Letters ,94:160202, 2005.

    [11] D. Donato, L. Laura, S. Leonardi, and S. Millozzi.Large scale properties of the webgraph. European Physical Journal B , 38:239243, March 2004.

    [12] G. W. Flake, S. Lawrence, C. L. Giles, and F. Coetzee.Self-organization of the web and identication of communities. IEEE Computer , 35(3):6671, 2002.

    [13] M. Girvan and M. E. J. Newman. Communitystructure in social and biological networks, Dec 2001.

    [14] H. Grice. Utterers meaning and intentions.Philosophical Review , 78(2):147177, 1969.

    [15] B. J. Grosz. Focusing and Description in Natural

    Language Dialogues . Cambridge University Press, NewYork, New York, 1981.[16] A. Java. http://ebiquity.umbc.edu/blogger/2007/

    04/15/global-distribution-of-twitter-users/ .[17] J. M. Kleinberg. Authoritative sources in a

    hyperlinked environment. Journal of the ACM ,46(5):604632, 1999.

    [18] P. Kolari, T. Finin, Y. Yesha, Y. Yesha, K. Lyons,S. Perelgut, and J. Hawkins. On the Structure,Properties and Utility of Internal Corporate Blogs. InProceedings of the International Conference on

    Weblogs and Social Media (ICWSM 2007) , March2007.

    [19] R. Kumar, P. Raghavan, S. Rajagopalan, andA. Tomkins. Trawling the Web for emergingcyber-communities. Computer Networks (Amsterdam,Netherlands: 1999) , 31(1116):14811493, 1999.

    [20] A. Lavallee. Friends swap twitters, and frustration -new real-time messaging services overwhelm someusers with mundane updates from friends, March 16,2007.

    [21] T. Lento, H. T. Welser, L. Gu, and M. Smith. The tiesthat blog: Examining the relationship between socialties and continued participation in the wallopweblogging system, 2006.

    [22] D. Liben-Nowell, J. Novak, R. Kumar, P. Raghavan,and A. Tomkins. Geographic routing in socialnetworks. Proceedings of the National Academy of Sciences, , 102(33):116231162, 2005.

    [23] Y.-R. Lin, H. Sundaram, Y. Chi, J. Tatemura, andB. Tseng. Discovery of Blog Communities based onMutual Awareness. In Proceedings of the 3rd Annual Workshop on Weblogging Ecosystem: Aggregation,Analysis and Dynamics, 15th World Wid WebConference , May 2006.

    [24] S. Matsubara, S. Kimura, N. Kawaguchi,Y. Yamaguchi, and Y. Inagaki. Example-based SpeechIntention Understanding and Its Application to In-CarSpoken Dialogue System. Proceedings of the 19th international conference on Computational linguistics-Volume 1 , pages 17, 2002.

    [25] A. A. Nanavati, S. Gurumurthy, G. Das,D. Chakraborty, K. Dasgupta, S. Mukherjea, andA. Joshi. On the structural properties of massivetelecom call graphs: ndings and implications. InCIKM 06: Proceedings of the 15th ACM international conference on Information and knowledgemanagement , pages 435444, New York, NY, USA,

    2006. ACM Press.[26] B. A. Nardi, D. J. Schiano, M. Gumbrecht, andL. Swartz. Why we blog. Commun. ACM ,47(12):4146, 2004.

    [27] M. E. J. Newman. Power laws, pareto distributionsand zipfs law. Contemporary Physics , 46:323, 2005.

    [28] G. Palla, I. Derenyi, I. Farkas, and T. Vicsek.Uncovering the overlapping community structure of complex networks in nature and society. Nature ,435:814, 2005.

    [29] J. Pontin. From many tweets, one loud voice on theinternet. The New York Times , April 22, 2007.

    [30] P. Rayson and R. Garside. Comparing corpora usingfrequency proling, 2000.

    [31] X. Shi, B. Tseng, and L. A. Adamic. Looking at theblogosphere topology through different lenses. InProceedings of the International Conference on Weblogs and Social Media (ICWSM 2007) , March2007.

    [32] P. Strawson. Intention and Convention in SpeechActs. The Philosophical Review , 73(4):439460, 1964.

    [33] D. J. Watts and S. H. Strogatz. Collective dynamics of small-world networks. Nature , 393(6684):440442,June 1998.