Ten Ways to Use SwiftRiver



On August 30th we’ll release the first public beta of the SwiftRiver platform, an open source toolkit of semantic web technologies. It’s been a busy few months as we’ve been working round the clock to bring you a solid product.

One of the questions I’m asked frequently is “What can I use SwiftRiver for?” Here are a few examples:

1. Monitoring Real-Time Conversations

The most obvious use of Swift, what organizations like Ushahidi will use it for, is to help manage large streams of real-time content. Whether from blogs, twitter, email, SMS or other means, when something happens (ex. the Haiti Earth Quakes or Kanye West and Taylor Swift at the VMAs) there’s a flurry of activity immediately following the event. For someone collecting news on that subject, for whatever reason, we envision that they would download an instance of Swift, and begin monitoring a number of sources discussing said event.

2. Brand Monitoring

Similar to the scenario above, a PR firm or advertising agency might use Swift to monitor mentions of company’s brand online. This would of course include mentions on Twitter, but might also include SMS and Email campaigns.

3. Data Collection / Research

As a research tool, Swift’s veracity algorithms can be used to curate sources and content that the user trusts to offer more accurate information.

4. Sweeping through Email

If you don’t understand anything about the real-time web or the aforementioned ideas. One of the things almost everyone can relate to, is the need for ways to improve email filtering. Swift is something anyone can set up to help them sort their email by ranking the people you are likely to want to hear from higher than the people you don’t. Meanwhile, users can apply language processing tools to automatically sort email by subject, category or sender.

5. Sweeping Through SMS

Even users who don’t deal with the web at all may find use in SwiftRiver. For one, not all real-time data is online. If you’re on a closed network, you can use it to process text messages received from a local gateway. This useful for users of tools like Frontline SMS or Kannell.

6. Creating a Public Aggregator

One of our pilot partners used Swift to create public ‘planet’ style aggregator and news portal. This required some custom work from our end but we’re excited for their launch.

7. Monitoring Hundreds of Blogs/Sources

Perhaps you’re just a person, a blogger or journalists, who consumes large amounts of information on a number of subjects, like me. I currently follow about 2000 blogs in Google Reader. Reader is extremely useful because I can aggregate whatever I want. From the aggregated datasets, I can then choose to read and share whatever I want. Likewise, in Swift having too much information is actually a good thing, there are still serendipitous ways for navigating content (using tags), as well as a number of filters for viewings items in a more structured manner.

8. Building Apps on SWS

A few days a go we received a number of tweets about an app called FlipBoard, asking if Swift was anything like it. SwiftRiver is actually a very different animal. We’re more like the stack that something like FlipBoard would be built on. We offer several advanced tools (social graph mining, natural language processing, location servers, twitter analytics) for free use via our open API platform Swift Web Services. Anyone can use them and thus anyone can build applications on top of them.

We’ve been working with large media organizations around the world to customize such tools for their needs but because our stack is open, so can you!

9. Dashboard and Shared History Across Media Channels

The most basic feature that makes many of the above possible is that Swift allows you to create dashboard that includes messages from a number of sources and lets you sort, search and curate them all any way you want. This might include videos, tweets, email, text messages, blogs. All of the content you have a need to mine for information, for any reason is also possible.

10. Improving Your Blog

In addition to using Swift to collect research, bloggers are using Swift Web Services for their blogs. Users of Wordpress or Drupal can add features like auto-tagging and more using Swift Web Services.




These are just some of the ways our alpha testers have been using Swift, there are many more possibilities and we look forward to exploring after our Beta! To find out more about Swift, try these recent posts from Robert Scoble, the BBC and GigaOM.

SwiftRiver Update

SwiftRiver at TED

For the past two weeks I’ve been in the UK doing quite a bit of work to answer questions, conduct interviews and even give a few talks about the SwiftRiver platform. I hosted our second SwiftRiver 101 in central London and held private sessions with a number of media groups interested in finding out more about the platform and it’s capabilities.

I had the pleasure of speaking with Jon Fildes from the BBC who published an interview with myself and Erik Hermsan this morning. The above pic is from a short TED talk I gave on Swift just last week. Those talks are slowly finding their way online, so keep watching TED.com for it’s release.

Read the BBC Profile on SwiftRiver.



For those of you watching attentively, you may have noticed we missed our last release. As we’re getting close to our Beta, we’re focusing less on big public releases and more on the iterative updates that can be found on our Github account.

SwiftRiver 101 Recap

Yesterday we held our very first SwiftRiver 101 which saw an audience of between fifty and sixty people descende upon the iHub to find out the basics of the SwiftRiver platform, as well as technical details like installation, core code and information about Swift APIs and the plugin framework. This included representatives from Google, Datadyne, NDI, Open Street Map and a number of other organizations.

Director/System Architect Jon Gosier and Lead Developer/Technical Architect Matthew Griffiths, lead the days presentations. Please videos below…

IMG_1037

Presentation 1 - Platform Overview

An explanation of the whole SwiftRiver ecosystem by Jon Gosier.

SwiftRiver 101 Session 1 from Ushahidi on Vimeo.



Presentation 2 - Swift Web Services

Detailed explanations of the Swift Web Services: RiverID, SiLCC, SULSa, Reverberations, and SiCDS by Jon Gosier.



Technical Breakout Panel

Detailed explanations of SwiftRiver’s code, redundant data abstraction layers and API by Matthew Griffiths.



Update: Slides from the day. Platform Overview and Web Service Overview

Latest Getting Started with SwiftRiver Video

Are you using the SwiftRiver alpha release? If so, you may find the latest Getting Started video useful.

SwiftRiver in Plain English

This week, on Wednesday 16th, we’ll be hosting an in-depth overview of the SwiftRiver software and SwiftRiver Web Services platforms in Nairobi, Kenya. You can register to attend that event (or the live stream) here. Myself, along with Technical Architect and Lead Developer Matthew Griffiths are really looking forward to displaying some of the technical aspects of the work we’ve been doing over the past few months. However, we realize everyone isn’t a developer and some people want a more accesible description of the project.

That said, here’s all you need to know about SwiftRiver (even if you don’t really care what the word veracity means)…




Screen shot 2010-06-15 at 1.32.30 AM

SwiftRiver (http://swift.ushahidi.com) is a free and open source software platform that uses algorithms and crowdsourcing to validate and filter news.

Features

  • Aggregate content from many sources (Email, Blogs and News, Twitter, SMS)

  • Rate content sources (Email address, URLs, Twitter User, phone number)

  • Sort content by authority (more trustworthy or less trustworthy)

  • Post reports from Swift to Ushahidi with 1-click

  • Plugin Architecture

  • Swift Web Services already integrated (SwiftRiver does all the things SWS does)



Other Notes

  • Open Source

  • Easy to theme

  • Allows instance admins to determine the sources they trust

  • Speeds up the process of sorting through large streams of data

  • Helps Ushahidi users manage news

  • Needs to be downloaded and run from a web server (like Ushahidi)






Screen shot 2010-06-15 at 1.32.41 AM

Swift Web Services (http://sws.ushahidi.com) is a suite of web apis that do different things. Each application does something different, and each are independent. SwiftRiver uses them all.

Applications

  • SiLCC pulls keywords from any Text (including SMS and Twitter) and automatically sorts related text

  • SULSa automatically detects location of incoming content/reports

  • SiCDS automatically filters out duplicate content (re-tweets, blogs, text messages)

  • Reverberations detects how influential/popular content is online

  • RiverID allows Swift users to carry their Swift reputation with them across the web



Additional Notes

  • Cloud applications (no download, no install)

  • Web APIs

  • Open-source

  • Free for anyone to use

  • Premium options available

  • RESTful (for other applications)

  • Fully integrates with Ushahidi

  • Hosted (no download, no install)

  • Available for Wordpress and Drupal

SwiftRiver v0.2.0 Batuque Released

batuque

SwiftRiver is Ushahidi’s software platform for managing large streams of data. Over the past 30 days we completely rewrote the SwiftRiver app to make it faster, leaner and easier to use. Because of the rewrite, some things like the veracity slider have been removed, but expect them to return in the next release version 0.3.0 Benga.

This latest release is the most-stable, most-exciting build released to date. There’s a lot of new features in this release, here are some highlights…

TOTAL REWRITE

We completely rewrote the core app from scratch. This allowed us to move up to Kohana 3.x and get rid of a lot of unused code. To give you an idea of how much this helped the last release shipped with a file size of around 10mb, this one is only 3mb!

The drawback was that we lost some functionality. User roles, pairing with Ushahidi, the veracity slider, and other things will return in version 0.3.0 Benga

ONE PAGE WORKFLOW

The latest Swift release really emphasizes speed of workflow. That said, we’ve completely gotten rid of the idea of having a page for working and page for administrating the site. All the action including administrative functions like activating plugins happens on one page now. There’s also no longer any pagination, the work area automatically refreshes and loads new content.

ADMIN BAR

All administrative functions (activating plugins, changing themes) are now controlled by an administrative bar located at the top of the page.

TURBINE

Turbine is our plugin platform. It’s now documented and ready for use. There are two types of Turbine plugins, impulse and reactors. As the name implies, impulse plugins process data before they hit the database. An example would be a plugin that translates text or pre-processes feeds. Reactor plugins process content after the database and allows plugins to take advantage of things like structured data and user interaction. Some plugins will be both impulse and reactor types.

SHIPS WITH PLUGINS

Some plugins will be packaged with future releases, others will be downloaded from our website. This release ships with SiLCC and TagThe.Net, two auto-tagging services, examples of impulse

More release notes for this version can be found at http://swift.ushahidi.com/doc/

Download v0.2.0 Batuque: ZIP

SwiftRiver Releases Plugins for Wordpress



For all you Wordpress publishers out there interested in SwiftRiver there are two official plugins we’re releasing today that bring Swift to your platform of choice: WP-SiLCC and WP-Veracity.

WP-SiLCC



WP-SiLCC is an auto tagging plug-in. Users who run news sites or aggregators should consider using this to add a basic level of taxonomy to all posts. WP-SiLCC also allows users to tag their own posts for sites that prefer a more folksonomic approach. WP-SiLCC uses active learning techniques to improve how it parses text over time.

Download WP-SiLCC from Wordpress.org

WP-Veracity



WP-Veracity applies bayesian algorithms to your content to help surface posts based on “interestingness”, influence and time-published rather than popularity alone. From SwiftRiver’s perspective, popularity is only an indicator of influence, not necessarily an indicator of authority. This plug-in calculates popularity (number of hits, trackbacks, comments), a bayes score and time (older content falls off organically) to offer a better picture of the most interesting posts on your blog at any given time.

Download WP-Veracity from Wordpress.org




For developers interested in creating their own plugins using Swift Web Services, visit our documentation wiki.

SwiftRiver Web Services Launches



The SwiftRiver Web Services platform offers RESTful apps that live in the cloud that we encourage other developers or applications to utilize. These services are diverse and powerful ways to improve data collection and management.

For non-profits and NGOs working in the field who may be worried about connectivity or security, all SWS Apps are also open source which means they can be run on your own servers or completely offline.

The first of these web services available is OpenSiLCC. OpenSiLCC allows users to parse and categorize any text on the fly. We are also developing open source applications which exemplify use. They’re potential building blocks for your ideas with code to help get you started. One of them, Abraxas is live and can be found here. Get the Abraxas code.

To sign up, visit http://sws.ushahidi.com. What are some use potential use-cases for OpenSiLCC?


  • SMS messages coming from Frontline or Clickatell could be tagged and categorized in real-time.

  • Users could aggregate non-tagged data (say from Twitter), parse, and output feeds with tags.

  • Develop your own glossaries and text parsers for content unique to your organization (or language).

  • Identify relationships between seemingly disparate message types (email, sms, twitter).



Sign Up For Web Services



Read the related post “Taxonomy for Text Messages”.

The next version of SwiftRiver (0.2.0 Batuque) will ship with these services (OpenSiLCC and others) fully integrated.

Visualizing Redundant Data Validation

data visualization

The following visualizations represent the various methods that go into calculating the reputation and veracity scores for users and content within the SwiftRiver platform. They are in part a response to this comment from reader Charles Bernard on this post. His comment:

In many instances, there are entities with a vested interest in preventing valid information regarding things such as voting, battles and even disasters, both natural and man-made.

For nearly any human effort, there exist a group of entities which would profit by either the details or the extent of a problem being kept from the public–and that can include relief agencies.

While tracking particular sources and their validity of reports is a step in the right direction, some entities, in particular governments and large corporations have access to the resources needed to generate thousands or even 100,00s of thousands of false data reports, flooding the system with misinformation.


In other words, what steps are we taking to prevent individuals with malicious intent from gaming SwiftRiver? Here was my response:

With Swift, we aren’t just validating content, we’re also validating users, users validate each other and content validates users. Content can also be used to verify other content. This creates a system that’s difficult to dupe, as one looking to falsify information would need to thousands of false reports from a number of different ‘users’, locations, and media channels.

What would be absolutely possible is for a group to download Swift, set up their own instance with all sorts of fake information and publicize it as fact. However, our distributed, decentralized reputation system River ID would show that outside of that instances ‘ecosystem’ no one trusts those users, or the instance. If the administrators opt out of tracking…they also forfeit any sort of benefits that come from River ID (trust from users who don’t know you or your site). In this case falsifying information is indeed easy, but promoting it becomes self-defeating, as the more people who aren’t under your influence see it, the less authority your Swift instance (with all it’s fake reports) actually holds.


I thought these concepts might be hard to grasp so I made the following Arc Diagrams to give a visual representation of what I actually mean. Click the images for high-versions. In the images below, the light grey color is simply used to indicate that content isn’t important for what that particular chart is showing you.

voting

Fig. 1 Individual Voting Against the Community

Figure 1 represents the most classic scenario of ‘gaming’, spam, bots or human individuals who are trying to vote bogus content ‘up’ so it will be weighted higher than other content. Section “A” represents User 1. Section “B” represents the activity of User 2 (our spammer). Section “E” represents the community within this particular Swift instance. Section “F” represents the users of our distributed trust system River ID or the global SwiftRiver economy. Section “C” represents individual content items. Section “D” represents the source that content is coming from.

The thickness of the lines connecting the users to the content and the source, represents how they’ve voted on those particular things. The thickness of the line for User 2 tells us that he’s rating these things very highly. Perhaps they come from his blog, and he wants them at the top! The thickness of the lines from the local community of the SwiftRiver instance as well as the global users tells us that these content sources are suspect. We can see that User 1 (who represents our average, active user) is voting closer to the how the community is voting, in fact even harsher than the community votes both the content and the source (represented by thinner lines).

This dynamic relationship between users and their interactions with content (in contrast to the local and global community) is considered when scoring users, content, and the sources. In this case the person voting against the tide is actually damaging his or her own reputation both locally and globally. However, this isn’t the only thing we consider, otherwise it would encourage conformity which also isn’t good (sometimes the outlier knows something the rest don’t.)

voting

Fig. 2 Factors Considered in Rating Content

In Figure 2 we can see that things like Time, Location, Activeness as well as Global and Local interaction, are all considered. Time (green) and Location (dark grey) are optional, for scenarios like a conflict or war. The content producer’s location, or proximity to ‘ground zero’ tells the system to factor this in to its score. Also the length of time that content is produced after the initial event may also tell us a lot. Things like ‘time’ and ‘location’ are optional because if your Swift instance is tracking something like a political scandal, time and proximity may not actually add any value to authority calculations.

Purple represents how active Users 1 and 2 are. In and of itself how much someone uses a Swift instance is irrelevants. It could mean that they are an eager member providing valuable assistance, or it could mean they are attempting a brute force attack on the system similar to the Figure 1 scenario. However, when coupled with other factors, frequency of interaction is considered and can positively or negatively weight the score for a user.

voting

Fig. 3 Ratings Visible to Users

In Figure 3 I’m illustrating what information is visibly shared in the scenarios above. The trust the local community has for Users 1 and 2 is displayed. The trust the global RiverID system has for Users 1 and 2 is also displayed. Thus, the trust Users 1 and 2 should have for each other is inferred.




Swift’s strength is in multiple points of redundancy. All scores are calculated against a multitude of other factors which may or may not be independent to the local community. This allows users to build scores more organically than x=bad y=good. There are some probabilistic calculations as well as algorithmic intricacies that make all this a lot more complex (a lot of math beyond my paygrade). We also calculate things like tags and content influence which compound the complexity.

Unless the local Swift instance administrators opt-in to participating in the global Swift ecosystem, their instance only holds authority with the people using it. In theory, their ‘gaming’ would then be contained to their local Swift instance. The fact that global authority isn’t considered would be an indicator that the public shouldn’t trust it. If they do opt-in to the global ecosystem, it becomes increasingly harder to continue gaming the system, as your scores are constantly weighted against the global community’s.

Because Swift is open source, it’s easy to reverse engineer or hack parts of the local system. But this is why we announced Swift Web Services last month, core components to the global system are centralized and well protected. This protects the global ecosystem, but still allows for independent uses of SwiftRiver, and all of it’s components as open, locally deployable apps. Some users, for example election monitors, may not want their SwiftRiver instance online at all. In that case, global authority doesn’t matter, the instance can and should only be influential amongst the people using it. This is why we opted for cloud solutions in addition to local deployment options, yet another redundancy to ensure the platform’s usefulness in multiple scenarios.

Post any follow up questions to the newsgroup or in the comments below.

Taxonomy for Text Messages

Getting crowd-sourced information into a system is only the first hurdle, the next is managing it

Getting crowd-sourced information into a system is only the first hurdle, the next is managing it. Last week we announced Swift Web Services, RESTful applications hosted in the cloud, that any third-party application or developer can use to assist in managing data. One of those services is SiLCC, a semantic tag extraction service for parsing text and extracting relevant keywords from Tweets and Text Messages. Tags like the names of people and places, actions that need to be taken or locations where things have occurred. It’s is an open service that we host on our servers, meaning anyone can use it in their applications. It will work with Word Press, Drupal, Frontline SMS, other aggregators like Managing News and more.

These other applications would send the SiLCC api a feed of content they want tagged, it then extracts keywords and returns a feed of tags linked to the content they refer to. From there they go on to be used however the original app developers decide.

tufts

For many organizations, this is a critical time saver. It saves humans the time from having to comb through a system to find useful content. Aggregating content in an Ushahidi instance that uses SiLCC or in SwiftRiver would allow bypass that manual sorting, allowing users to focus on verifying reports and responding to urgent requests.

Tags are the first, autonomous layer of taxonomy for content. They won’t be the only layer, but if you’re monitoring 100 different mobile phones sending in messages referring to volcanic eruption in Iceland, but you’re looking for the ten that reference one particular cancelled flight, this is one of the quickest ways to couple disparate items.

280 Characters or Less

A number of services are out there that offer similar functionality, in fact we recently partnered with Thomson Reuters who offers a service called Open Calais which extracts semantic keywords from articles and blogs. Where Open Calais doesn’t work so well is with shorter messages that are less than a paragraph in length. For managing information from mobile phone users, this is a problem because that content falls well below the threshold of Open Calais. So our partnership allows their service to supplement ours and vice-versa.

Active Mobile

SiLCC does one thing in particular differently than many apps out there that might be similar. Rather than exist as service that has to be improved by the developers (us) we’ve incorporated active learning techniques that allow it to learn autonomously. This is because we don’t know where or when the next crisis that needs to be monitored will occur. We don’t know who will set up the next SwiftRiver instance or what they’ll use it for. So we designed SiLCC to adapt to any and all scenarios by learning from the instance of use, rather than the top-down approach of tweaking the app on demand. This is known as persistent tagging. SiLCC auto-tags content, but also self-improves and accumulates knowledge (rather, conditions that it can use to improve future decisions).

Natural language processing geeks will wonder if they can define their own corpora and add words specific to their organization or event directly to SiLCC? Of course, this saves time and also improves performance. Additionally, by default we’ve included corpora for dealing with Twitter ontology as well as the TXTSPK (text speak) commonly used by mobile phone users.

Secret Ontology

Finally, the fact that we can predefine corpora, gives organizations the option of setting up codes for people utilize the system remotely. For instance, we could customize an Ushahidi instance to automatically verify and map any text message that contains a unique string (example “Help trapped in Port-au-prince Market #a1u9”). That tailing string of alphanumeric characters is like a password that tells the system to do something. An organization could set up these unique character strings and functions, giving them only to people they send to the field. In the event of an emergency, that person could communicate with HQ in ways that the other users of the system couldn’t. We have other apps for auto-detecting location, which makes it simple to extract that data as well. Rather than take a laptop into the field to map data, an organization could set up a specific set of keywords that represent locations or events. Then workers, armed only with phones with SMS functionality could use the system remotely.

This isn’t why we designed the app, and I doubt many orgs will use it this way, but I think it makes for an interesting possible extension of the Ushahidi platform. A more common use will probably be differentiation between actionable (someone needs something done now) and non-actionable reports (nothing needs to be done) for emergency response organizations.




We announced our alpha of SiLCC last week. If you’re interested in applying to be an alpha tester, click here. SiLCC is open source, so if you’d like to contribute to the project as a developer, follow the project on GitHub.