Sweeper User Guide v0.3

User Guide

Updated April 3, 2011Living Document for Sweeper v0.3

http://swiftly.org/userguide

http://www.google.com/url?q=http%3A%2F%2Fswiftly.org%2Fuserguide&sa=D&sntz=1&usg=AFQjCNHgfZ5RFFPYthCaEL1AQN2bkeNo4w







Table of Contents Table of Contents

I. IntroductionII. Using this Living DocumentIII. About the Sweeper Application

Suggested UsesAs a FeedReaderFor Passive Data-ProcessingFor Active Content FilteringFor Real-time Social Media CurationAs a Vertical Content Dashboard

TerminologyIV. Explaining the Sweeper UI

Analytic DashboardMain Content WindowAdmin PanelView TabsFilter PanelRefresh Staging AreaRating PanelContent Items

V. Overview of PluginsDuplicate Content FilterGoogle Language ServicesGeo-Location (Yahoo)TaggingUshahidi PushTag ClusteringAnnotations*Quiver/Bookmarking*

VI. Adding SourcesEmail (IMAP)Email (Gmail)FrontlineSMSNews & Blog SearchRSS/ATOMFlickrSMS GatewaysTwitter

I. Introduction Thanks for using the Sweeper application! Sweeper is meant to be fairly intuitive but we’re well aware that sometimes it’s a little overwhelming at first to get started and knowing what’s possible. In this guide we will walk you though using Sweeper and a handful of the native plugins. This is not a guide for installing it (for that look here), rather this guide will walk you through use of the Sweeper software and the various plugins for it. If you are a developer seeking information on how to develop plugins, parsers or other modules for Sweeper and other SwiftRiver applications, click here.

http://www.google.com/url?q=http%3A%2F%2Fwiki.ushahidi.com%2Fdoku.php%3Fid%3Dinstall_s&sa=D&sntz=1&usg=AFQjCNESNwhT0TgF6twIQp7BT24dQ3lRTQ

https://docs.google.com/document/pub?id=1j70pxPF1N3N8Egs3amLCKr5aaGhrnQj6U5R8HuLWpjc&pli=1



II. Using this Living Document Because Sweeper is an open-source product, who’s code and feature-set changes quite frequently, this user guide is a living document that serves only as a snapshot of what’s possible at the time it was last updated. We invite you to revisit this link often. If you decide to print it, just be aware that as soon as it’s transferred from bits to pulp, it’s essentially become outdated. Likewise, any copy of this document that is distributed in PDF form, DOC form, or FLV form, those versions too are likely outdated. To ensure you have the latest version, it can always be found at - http://swiftly.org/userguide/

http://www.google.com/url?q=http%3A%2F%2Fswiftly.org%2Fuserguide%2F&sa=D&sntz=1&usg=AFQjCNH9_LcEF9Ys9fjSlZMh7Nmy90Pf0g









III. About the Sweeper Application Sweeper is an application that focuses on the aggregation, curation and filtering of real-time content. It assumes the user knows exactly what sources they are tracking but needs an application to help them prioritize their attention. Here is a comparison. Sweeper is sort of like an open source version of TweetDeck, or to use a Google analogy: Google Reader. The user defines a number of sources to track and Sweeper offers a number of ways for filtering and viewing that collected content.

Suggested Uses What can Sweeper be used for? A number of things but here’s a few ideas...

As a FeedReaderSweeper was designed for collecting large amounts of disparate real-time data and sweeping through it quickly and efficiently, while also doing things to that content. So there is an emphasis on speed and summation of large datasets, allowing the user to decide upon where to spend his or her time to delve deeper. As mentioned in the examples above, one might consider using Sweeper as a substitute for a traditional feed-reader. However, unlike most feed-readers there are no restrictions on the type of data that can be aggregated, and there’s smart triggers applied to data going out. ex. If I perform this function, content is affected in this way. This functionality can be useful for setting up really advanced conditional taskingwhich we’ll cover later.

For Passive Data-ProcessingSweeper can also be configured to be a passive filter for data, meaning you can set it to aggregate content, then automatically perform certain tasks around that. ex. Aggregate all tweets from #hashtag tagged in the state of Maine and send only that data to another platform. When used in this way, Sweeper essentially becomes a smart cron tool equipped with geo-tagging, natural language processing and other power contexual features.

For Active Content FilteringUsers are also provided a number of utilities for quickly searching through content. Clicking on a selection of tags allows the user to see content only selecting those tags. The cluster panel allows content to be clustered around other content in various channels that are similar. The user can also sort by assigned scores (which can represent the favor they might have for some types of content over others) in any variation between 1 and 100. ex. show me only the content with a score of 40 or above; or only content between 20 and 60.

For Real-time Social Media CurationSweeper can be used for real-time media curation across channels (Blogs, News, RSS/ATOM, Twitter, SMS, Email) and across over 50 languages. For a journalist attempting to collect data

that’s rapidly unfolding across social-media, this can save potentially unprecedented amounts of time. Rather that opening 50 different windows for different apps, the Sweeper application can be used to mine and add context to disparate content, completely at the users whim. Perhaps even more interestingly, all this aggregated data can be annotated, mapped, shared or exported in a number of ways after it’s been structured as the user sees fit.

As a Vertical Content DashboardPerhaps you have a need to know what’s going across various industries at all times. You could enter the feeds of several well known bloggers, the @twitternames of thought leaders in that industry, a public facing email address you control like [email protected], a public facing shortcode (ex. 6060). That might just be your sports page. But when you replicate that experience multiple times across Entertainment, World News, Food, Lifestyle etc. you end up with an equally rich immerse real-time data-mining tool across all those interests.

Terminology Before we continue, it will help if you have a basic understanding of the terminology we use to discuss the application. Sweeper (capital ‘S’) - the name of a SwiftRiver application for aggregating and processing feeds of contentsweeper (lowercase ‘s’) - generally, one who performs the function of sweeping through feeds of content. However, in the Sweeper application the user role of sweeper is assigned to users who can edit tags and process content but who don’t have administrative rights to the application. sweep - to process datachannel - the distribution type used to deliver content. Twitter, Email, RSS/ATOM, SMS are all channels.source - the place (or person) from which content originates. a persons @twittername, email address, blog or web url, or phone-number would all be considered sources. Several sources may be collected to reference a single identity ex. this blog, this url, this phone number all belong to the same personcontent item - a single item of content collected from a feed, regardless of the channel it came in on or the source it came fromtag - a layer of taxonomy applied to all contentlat/lon - geospatial coordinates; short for latitude and longitudeveracity - more accurately the subjective favor the user (or users) has for content. The baseline of favor expressed for certain types of content is uses as a building block for a score applied to content. This score is then used both for prioritizing sources and for recommending other content the user or users may favor.cluster - a collection of content items deemed to be statistically similar based on tagseditors - editors don’t have full administrative rights to the application but they can perform tasks that sweepers can’t.turbine - another word for plugins for SwiftRiver applicationsimpulse turbine - plugins that pre-process content (before the application receives it). Impulse Turbine plugins affect how data is structured as part of the Swift object module.reactor turbine - plugins that process content based on human interaction or assigned logic (after the application has received it). Reactor Turbine plugins can be used to take structured data and do

something with it.parsers - on the application architecture level parsers are modules that can be written to create new sourcestrusted source - applies a default score of 100 to a source allowing the user to vote against a high-score as the default. ex. you have my trust now but could lose it over-time

IV. Explaining the Sweeper UI

So now that we’ve got the basics we can walk you through the Sweeper user interface, it’s basic features and functions. At first look the application can be a little intimidating so hopefully this guide takes the edge off (like a martini!).

Analytic Dashboard This dashboard offers a quick survey of the content being collected by Sweeper. Where is data mostly being collected from? How much content in total? Howe much from each channel? The charts are dynamic and update with each use of the application.

Main Content Window Below you see the main content display window. This is where aggregated content can be viewed.

Admin Panel

This area contains four tabs. Login, Impulse Turbines, Reactor Turbines, Sources, Add User

Login - as you might expect, this area allows users to login to the applicationImpulse Turbine - for enabling or disabling impulse turbine pluginsReactor Turbine - for enabling or disabling reactor turbine pluginsSources - this is the area where one can add sources to aggregate into SweeperUsers - area for adding users and assigning their administrative rights

View Tabs

This area contains several tabs for altering the view of the main content window. The titles are fairly self-explanatory. Dashboard, New content, Accurate, Inaccurate, Crosstalk, Irrelevant

Dashboard - contains a collection of charts plotting various aspects of the content being collectedNew content - for viewing new content as it’s being collectedAccurate - shows all content voted upInaccurate - shows all content voted downCrosstalk - shows content that is completely off-topicIrrelevant - shows content that is on-topic but not relevant to the user’s specific needs

Filter Panel

Filters for changing the view of the main content window.

Veracity Slider - allows the user to set a range of anything between 1 and 100 to view content by assigned scoreChannels - view only the content that came in on a particular channelTags - view only the content containing a selection of tags

Refresh Staging Area

Reveals how much content has been aggregated since the main content window was last refreshed.

Rating Panel

The upper left part of the Rating Panel is for quickly determining information about content. Is this a ‘trusted’ source or has it been rated as trusted by the people within your bounded (or unbounded) group of users? The upper right quadrant shows a score that represents the favor the user or their community has for the associated source. In the lower quadrant we have four buttons here is what they essentially do:

Green (Up) - expresses favor for a content item while positively affecting it’s sources score so that in the future content from the same source will be prioritized. Red (Down) - expresses disapproval for a content item while negatively affecting it’s sources score so that in the future content from the same source will be deprioritizedCrosstalk - expresses that this content is not relevant because it’s essentially been collected by mistake and that it’s not useful. Removes it from the main view without negatively affecting the source score.Irrelevant - expresses that this content is not germane to the task the user is trying to perform and more importantly, is somehow damaging or distracting. Removes the content from the main view with negatively affecting the source score.

It’s important to note that these votes whether up or down are not the only things being factored into the scoring of content. We also factor in a number of things like the tag profile of content, the ratings of the individuals users rating this individual, and other factors. For an in-depth explanation see the RiverID System Guide.

Content Items

Content items are divided into three sub-sections: the Header, the Body and the Footer. In the Header you’ll find an icon denoting what channel this content came in on: Twitter, Email, SMS, or RSS/Atom. Clicking this icon will reveal more:

A pop-up display reveals information about the source and the content itself:

Source - the source of the content (a Twitter @name, email address, url or phone number)Channel - the channel the content came in on (Twitter, Email, SMS, or RSS/Atom)Source Score - the trust score associated with this source Link - hyperlink to the original content

In the Body you’ll find a portion of the message (from Twitter and SMS) or headline/subject (Articles, Blogs, Email)

In the Footer you’ll find tags which add a layer of taxonomy to the content. You can quickly find other content like this particular content item by clicking on the tags themselves. Users can also add their own tags*, edit tags* or delete tags to help the system improve**. * Adding tags and editing tags is not possible in the v0.3.0 of Sweeper UI. However a slight modification of the code exposes this feature and makes it available. ** There is an active learning element of our Tagging API that allows the system to learn from user feedback that will be available soon. You can read more about this in the section on Impulse Reactor Plugins.

V. Overview of Plugins There are a few plugins that ship with Sweeper and that are either enabled by default or commonly used. There are way too many to list here so in this section we’ll explain what a few of the available plugins are and what they are used for. You can always find more plugins for Swiftly applications at http://plugins.swiftly.org

Duplicate Content Filter

When activated, this plugin passes all content through the Duplication Filter API in the Swift Web Service stack, effectively removing all duplicate content (like retweets) from a feed.

Google Language Services

When activated, this plugin passes all content through the Google Translate API. Google Translate will automatically detect what language the content is in, translate it and send it back. This allows you to aggregate content in multiple languages but only see the resulting translated, English content! This is a huge time saver when doing international research.

But how do you know what content has been translated. When activated, additional info in the content item’s header will let the user know what has been translated, and from what language. See the example above. If you expect large amounts of data you may want to opt for the Google Enterprise Language Service

http://www.google.com/url?q=http%3A%2F%2Fplugins.swiftly.org&sa=D&sntz=1&usg=AFQjCNGgaOAr1IYswE9FnwoA3BfUghrDAQ







plugin instead. With this plugin the amount of content that can be translated is increased significantly. It requires an API key from Google. If you need help getting Enterprise level access, contact us at [email protected]

Geo-Location (Yahoo)

When activated, this plugin passes all content through the Yahoo Placemaker API where we try to detect a location where the content is likely to have originated from. We then apply lat/lon coordinates to the content that are then stored as part of the content meta info. When passed to other systems, this lat/lon info can be used for geo-spatial reference. To use this service, you’ll need to acquire a Yahoo Placemaker API key from Yahoo. If you need help getting Enterprise level access, contact us at [email protected]

Tagging

When activated, all content passing through Sweeper will be tagged by our natural language processing API. Essentially this services tries to extract what it thinks are the active keywords being used, and uses that to help the user automatically sort content. Tags are very important to SwiftRiver and we take a dual taxonomic and folksonomic approach in our applications. Meaning, although these tags are machine generated, they can be edited and improved upon by humans which in turns helps to teach the algorithm how to tag content better.

Ushahidi Push

For users of Ushahidi or Crowdmap. This will take any content voted up in the Ratings panel and automatically plot it on a designated Ushahidi deployment map as an approved report. This is a significant time saver for large groups who want to use Sweeper to curate data, but use Ushahidi or Crowdmap to visualize it.

mailto:[email protected]





Users will need to enter and API key for an Ushahidi deployment that they have administrative rights to. ex. http://xxx.xxx.xx.xxx/ushahidi/ There are many variants of this plugin. One is called Ushahidi Passive Push and essentially it turns Sweeper into a cron suite where content is automatically aggregated, structured, and passed along to Ushahidi...mostly without any human operators!

Tag Clustering

When activated, this plugin allows the user to view content similar to any particular content item. The clustering is done by using a statistical profile of the associated Tags for proximity matching. This gives the user more control over alternative recommendation methods, because it can factor in the users own tagging methods. For instance if I use unique identifiers or words unique to my organization, they too can be used as part of the proximity matching algorithm!

Annotations*Annotations offers the ability to annotate any content item. This can be used to leave individual notes for reference, or to collaboratively converse around content with your team.

Quiver/Bookmarking*Quiver is a bookmarklet that allows the ability to quickly collect content from around the web and post it to your Sweeper deployment (effectively adding them to your quiver). This can be useful for individually collecting research, or if you have teams of contributors actively recommending content for you to then apply all our contextual APIs to. * These features will ship with the forthcoming release of Sweeper.

http://www.google.com/url?q=http%3A%2F%2Fxxx.xxx.xx.xxx%2Fushahidi%2F&sa=D&sntz=1&usg=AFQjCNFK0p_svgKboTRFqQsukAHBwmQwmw












VI. Adding Sources

To begin using Sweeper at all, one must begin aggregating from predefined sources. Essentially this is where you inform the system what you want to track. Sweeper currently only accepts inputs that are updated streams of data - feeds - in XML/ATOM/RSS or JSON format. To get any content we don’t currently accept into Sweeper, all one would need to do is write a parser, a few lines of code that tell the application how to structure data coming from that particular feed. The types of content natively supported are IMAP, Gmail, FrontlineSMS, GoogleNews, any RSS or Atom feed, Flickr, other SMS gateways and Twitter.

Email (IMAP)

Sweeper will accept the IMAP details of any email account and begin pulling in content allowing you to aggregate, translate, tag and cluster your email.

Email (Gmail)

Sweeper supports aggregating email from any Gmail account, pulling in content and allowing you to aggregate, translate, tag and cluster your email. Although Gmail also supports IMAP, the native Gmail aggregation is recommend.

FrontlineSMS In combination with FrontlineSMS, Sweeper can become a powerful SMS curation service that aggregates real-time content (SMS) even if there is no internet connection! There are two ways of integrating FrontlineSMS with Sweeper. Remote and Local.

Is for users who have access to some type of network, either it’s via the Internet or just a LAN. Simply enter the details of the FrontlineSMS deployment you want to pull data from. You will need to use this in combination with the FrontlineFetch go-between servlet which can be downloaded from http://plugins.swiftly.org/?p=51.

The local option requires that Sweeper deployment and Frontline:SMS be installed on the same machine or server. This allows the Sweeper application to pull directly from the FSMS database and will work even if there is no Internet.

News & Blog Search

This source module allows you to set up a keyword search, returning real-time search results from Google News, Posterous, Blogger and Wordpress.com. The results will appear in the main content view, translated if necessary.

RSS/ATOM

Self-explanatory, simply enter the URL of a feed in the RSS, ATOM 1.0 or ATOM 2.0 service and Sweeper will begin aggregating that content.

Flickr

This service allows the user to aggregate content from the photo-sharing service FlickR.

The options are fairly simple. Tag Search will return results aggregated from Flickr based on a search using a specific keyword ex. cats, dogs, Eiffel Tower. Tag Search with Location will only return geo-tagged results, great when used in combination with a mapping platform like Crowdmap. Follow User is for only returning the results from a specific user account.

SMS Gateways

We’ve included a generic SMS gateway aggregator. It’s set up to read from the HTTP posts commonly used by services that don’t have APIs. However, it’s there largely to fork and modify - a head start on integrating your own SMS service.

Twitter Culling content from Twitter is easy. There are two options Search and Follow User.

With Search, the user enters the name for a search (the name that has relevance to you) followed by the term(s) that they would like to search. These can be common words or hashtags. ex. ‘My Twitter Search’

and ‘#searchword”. There is no limit to the number of search queries one can have, however the return of results is limited by your individual access to the Twitter search API. If you’d like to increase this access contact Twitter to get white-listed or contact [email protected]. Note on Sources and Search: When using a Twitter search please note that the search itself is not a source. In the Swiftly eco-system, content producers are sources. This means that we will identify all the individual content producers and help you keep track of them. This allows one to monitor conversations around keywords that might lead them to great content producers.

With Follow User, the user can enter a unique name for the Twitter handle they want to follow along with the actual @name on Twitter. For example ‘Bob Smith, Rwanda’ alongside ‘@bobsmith’. This is helpful because it perhaps allows you to leave notes about who you may be following for yourself, or your team members.

Technology

Sweeper User Guide v0.3