Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
The Social Brand Project
Research on Brands’ Use of Twitter: Who Follows Brands?
“Understanding Twitter Data Structure” or
“Unleashing Your Inner Geek”
The OPD Problem:
OPD = Other People’s Data
Alphabet Soup • HTTP = HyperText Transfer Protocol • HTML = HyperText Markup Language • RSS = Really Simple Syndication • API = Application Programming Interface • XML = Extensible Markup Language
Source Code from “SimpleWebPage.html”:
<head> <body> This is the world's simplest web page. </body>
</head>
Every Tweet has a unique address.
View Source
New Twitter
Every Twitter user has • a distinct username • a distinct ID number
Old Twitter
RSS
View Source (Firefox)
New Twitter: More difficult to find User ID
Buried in the HTML code: twttr.API._requestCache.inject("account/verify_credentials", [{}], {"is_translator":false,"show_all_inline_media":false,"favourites_count":63,"profile_background_color":"7AC4EE","url":"http:\/\/www.facebook.com\/joebobhester","follow_request_sent":false,"profile_background_image_url":"http:\/\/a0.twimg.com\/profile_background_images\/26007960\/TwitterBKG.gif","description":"UNC advertising professor, social media research, NCAA football fan, poker player, amateur chef","screen_name":"joebobhester","status":{"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"text":"Same for AD\/PR! RT @rtburg: #uncjomc: \u201cThe phrase \u2018I don\u2019t do math\u2019 is no longer acceptable\u201d for journalists.\" Jack Gillum at #NICAR11","contributors":null,"retweeted":false,"in_reply_to_user_id_str":null,"retweet_count":0,"geo":null,"source":"\u003Ca href=\"http:\/\/www.tweetdeck.com\" rel=\"nofollow\"\u003ETweetDeck\u003C\/a\u003E","created_at":"Fri Feb 25 19:08:51 +0000 2011","id_str":"41213050061066240","place":null,"in_reply_to_status_id":null,"coordinates":null,"truncated":false,"favorited":false,"id":41213050061066240,"in_reply_to_screen_name":null},"verified":false,"friends_count":5978,"location":"Chapel Hill, NC","geo_enabled":true,"time_zone":"Eastern Time (US & Canada)","profile_text_color":"3D1957","lang":"en","notifications":false,"created_at":"Fri May 22 16:57:20 +0000 2009","profile_sidebar_fill_color":"7ac4ee","id_str":"41852681","listed_count":510,"statuses_count":9100,"profile_background_tile":false,"followers_count":9101,"profile_link_color":"FF0000","protected":false,"profile_sidebar_border_color":"65B0DA","name":"Joe Bob Hester","following":false,"id":41852681,"contributors_enabled":false,"profile_use_background_image":true,"utc_offset":-18000,"profile_image_url":"http:\/\/a2.twimg.com\/profile_images\/336606778\/TwitterPhoto_normal.gif"}, 1);
API • The acronym "API" stands for "Application Programming Interface".
• An API is just a defined way for a program to accomplish a task, usually retrieving or modifying data.
• In Twitter's case, we provide an API method for just about every feature you can see on our website.
• Programmers use the Twitter API to make applications, websites, widgets, and other projects that interact with Twitter.
• Programs talk to the Twitter API over HTTP, the same protocol that your browser uses to visit and interact with web pages.
http://dev.twitter.com
http://api.twitter.com/1/statuses /user_timeline.xml?screen_name= joebobhester&include_rts=true
What is XML? • XML stands for Extensible Markup Language. • XML is a markup language much like HTML. • XML was designed to carry data, not to display data. • XML tags are not predefined. You must define your own tags. • XML is designed to be self-‐descriptive. • XML is everywhere.
XML Documents Form a Tree Structure • XML documents must contain a root element. This element is "the parent" of all other elements.
• The elements in an XML document form a document tree that starts at the root and branches to the lowest level of the tree.
• All elements can have sub elements (child elements): <root>
<child> <subchild>.....</subchild>
</child> </root>
• The terms parent, child, and sibling are used to describe the relationships between elements. Parent elements have children. Children on the same level are called siblings (brothers or sisters).
<?xml version="1.0" encoding="UTF-8"?> <statuses type="array"> <status>
<created_at>Fri Feb 25 17:53:47 +0000 2011</created_at> <id>41194158391701504</id> <text>Thanks! RT @LarryTolpin: Those Who Teach ... #FF</text> <user>
<id>41852681</id> <name>Joe Bob Hester</name> <screen_name>joebobhester</screen_name> <location>Chapel Hill, NC</location> </user> </status> <status> <created_at>Fri Feb 25 17:52:01 +0000 2011</created_at> <id>41193714781143040</id> <text>Study: Consumers Combine ... http://bit.ly/f75ioI</text>
<user> <id>41852681</id> <name>Joe Bob Hester</name> <screen_name>joebobhester</screen_name> <location>Chapel Hill, NC</location> </user> </status> </statuses>
http://api.twitter.com/1/users/ show.xml?user_id=41852681
View Source:
How do we find users’ ID numbers?
http://api.twitter.com/1/ followers/ids.xml?screen_name= joebobhester&cursor=-1
View Source: <?xml version="1.0" encoding="UTF-8"?> <id_list> <ids> <id>228442651</id> <id>202814670</id> <id>28825315</id> . . . <id>16221120</id> <id>15822191</id> </ids> <next_cursor>1328658213834945578</next_cursor> <previous_cursor>0</previous_cursor> </id_list>
First 5,000: http://api.twitter.com/1/ followers/ids.xml?screen_name= joebobhester&cursor=-1 Next page (up to 5,000): http://api.twitter.com/1/ followers/ids.xml? screen_name=joebobhester &cursor=1328658213834945578
Social Brand Project Assignment #3 Use the Twitter API to download all user IDs for your assigned brand (example = WSJ) • Save the HTTP address of each page in a text document you will turn in:
Page 1 = http://api.twitter.com/1/followers/ids.xml?screen_name=wsj&cursor=-1 Page 2 = http://api.twitter.com/1/followers/ids.xml?screen_name=wsj&cursor= 1361624534012814181
• Save each page as an XML file • Name files: ids_page1_username.xml • List the file names on your text document as well.
ids_page1_wsj.xml ids_page2_wsj.xml