Algorithms Augmenting Human Decisions

Here’s an update about the SwiftRiver platform from PDF11 which I had the pleasure of speaking at yesterday. My slides are below and here you can find video of my presentation

Crowdsourcing 102: Mining Real-Time Data  

The summation of the talk is that the Swift project has been assigned a very complex and incredibly difficult task: to verify and contextualize data from the mobile and social web. How do we do this, well this seems to be the part that confuses people. It’s not any of our apps, and it’s not any of our individual APIs that we rely upon to do this. It’s the combination of all these things together, as part of one robust algorithm that tries to digitally reconstruct the real-world context, using the features extracted from the content t prioritize and de-prioritize information relevant to that context.

I like to refer to this as folksonomic triage where layers of historic, social, temporal, geospatial and other types of information are layered one another to perform a function, and the system (through a process called active learning) then learns how to improve form the user’s interaction. What this attempts to do is allow the human to give the machine algorithms some insight into the types of content they prefer, and the types of content they dont. A statistical profile of the content features of each type is recorded, with varying degrees of nuance in-between including accounting for bias, crosstalk, irrelevancy and falsehoods.

Some of this happens on the application side, some of it happens on the logic/cloud side of things. This is because it’s very important that user understand that the platform is there to serve them, and not the other way around; algorithms augmenting human decision making. This means we’ve abstracted some elements of the system logic (the elements that everyone needs to re-use over and over again) while the things specific to the use of the platform, are defined in the UI.

Usecases

We’re really excited to have had a number of really amazing partners new and old using the platform. This includes groups like Newsti.ps who are building a ‘people’s newswire’ using the Swift products.

There are also some really big uses that are occurring. For instance this BBC article profiles the PAX system that is using our platform to power a conflict early warning system. They want to index massive amounts of data from around the world and then use that data to spot historic patterns and trends that then can be used to demonstrate confidence in future patterns.

One of our favorite uses of the Swift platform to date was Product (RED)’s use last year to mashup large quantities of social media activity to power their Turn The World (RED) campaign.

There have been many more uses that we can’t talk about yet, but hopefully those become pubic soon.

Some Numbers

There are currently eight different code repositories housing the greater Swift project. Each of these API elements is tackled as if it were a single problem. This includes code for location disambiguation, natural language processing, influence detection, reputation monitoring and duplication filtering. You can find more about them here - http://blog.swiftly.org/post/5788873594/resources-for-developers

  • These combined repos contain around 150,000 lines of code (not including frameworks like Kohana)
  • Over 7,000 downloads of Sweeper to date
  • Which theoretically means at least 7,000 users of our APIs
  • Sweeper users tend to aggregate thousands of items of content over the life of a deployment which means we’ve taken around 70,000,000 items of unstructured data and done things to it like add location, tags or filtered the duplicates. That’s a very liberal extrapolation, but if gives you a sense of the amount of data we’re dealing with.
  • As the project moves forward, and all our APIs are finally completed, this number will grow exponentially. With RiverID alone (which tracks the reputation of content and people online) we expect to be indexing over half a billion items of content and actions from the social web alone by the end of the year. That’s just one API, the others will also need to scale on equal terms.

Photo by Fabrice Florin

Translating Realtime Social Media

One of the problems a lot of crowdsourcing projects have is that they end up pulling in massive amounts of data from the web, Twitter and other channels from around the world. This means content arrives in many different languages, often languages that the deployer doesn’t speak.

Currently in Sweeper and soon in Ushahidi, users can translate real-time content from one language into another, on the fly, as they receive it. This is done using our Google Translate plugin which currently supports 50+ languages.

For the Sweeper deployment we’re using to monitor the situation in Japan internally, we’re using this feature to monitor events, since we can’t manually translate every single message coming through. We’ve found it a significant timesaver. You can also see below that we’re showing the user what language the message was translated from, or if it’s been translated at all…

Before:

After:

It’s important to understand, that this is machine translation, so it’s far from perfect. But if you’re monitoring feeds from multiple countries across Twitter, RSS, Email or SMS it’s sometimes useful enough to get a quick sense of what’s being said, where to potentially look for more info, or perhaps where to direct human translators.

Positions Open at Swiftly.org

Ushahidi is currently seeking to hire individuals in the following full-time and contract positions: Sr. Web Application Developer, Online Ethnographer/Behaviorist, Computational Linguistics Expert. As these positions are filled this post will be updated to reflect what’s still available.

Contact: Jon Gosier, Director of SwiftRiver at jg[at]swiftly.org

Sr. Web Application Developer (Python/PHP)

Experience Requirements: At least 4 years professional experience in PHP/XHTML/MySQL/CSS building web applications. This position is minimum full-time for 12 months. Developers with a background in Design, experience with Ruby (Rails), Python (Django) and PHP Frameworks are definitely preferred but all candidates are welcome to apply.

Location: Anywhere, Global

Salary: $60k per year, U.S. dollars. 75% full-time commitment expected although candidates are welcome to maintain side-projects so long as they don’t affect primary deliverables and deadlines.

Online Ethnographer/Behaviorist

Experience Requirements: PHD or PHD-Candidate level with a background in the qualitative study of network dynamics and ethnography of online communities. Position will require deep analysis of dynamics in online communities, and work alongside computer science teams to assist in the development of applications and algorithms based upon their research.

This position is minimum full-time for 12 months.

Location: Anywhere, Global

Salary: $60k per year, U.S. dollars. 75% full-time commitment expected although candidates are welcome to maintain side-projects so long as they don’t affect primary deliverables and deadlines.

Computational Linguistics Expert (Python)

Experience Requirements: At least 5 years professional experience in the development of computational linguistic algorithms using Python. Applicant would supervise the development of open-source semantic technologies, with an emphasis on modularity and scalability. This position is contract.

Location: Anywhere, Global

Salary: Contract. Negotiable.

Tags: swift jobs hiring

Sweeper v0.3.0 Released

Download Sweeper V0.3 Now - Click Here!

Hi all you Swiftriver and Sweeper followers out there.

The Swiftriver team is over the moon to announce the launch of our latest version of the Sweeper app.

Those of you who have been following our progress will know that this release comes hot off the heels of the V0.2 that we pushed to you all a couple of months ago.

As always, the Swiftriver guys and girls have had their heads down crafting and coding some great new features that go a long way to making Sweeper bigger and better than ever before.

So, what can you expect to see out of the V0.3 release?

The Sweeper Dashboard

Users of the new release will be greeted by a lovely shiny new dashboard that makes use of our custom built analytics module and the great jQuery graphing library jqplot [http://www.jqplot.com/] (thanks Chris for this easy to use and powerful tool). You can expect to see a lot more data visualisation in upcoming releases and we intend to utilise the power of the new analytics module throughout the Swiftriver family. 

Tag Based Navigation & Channel Based Navigation

Building our powerful tagging service, the latest version of Sweeper now offers users the ability to refine their view of content based on tags that are important to them. Simply click on any tag in the content list and see the list repopulated with only content that contains that tag!
Want to compare and cross check what people are saying on Twitter with what they are posting to flickr? Well now you can with Sweeper V0.3. The new channel based navigation filter makes it dead easy to view only content collected by a specific channel while still continuing to collect and process content from all over the web.

 

 Content Clustering

This is a great new tool that allows you to see how similar other content is to a new piece of content. Make sense? Well basically, every new piece of content that comes into Sweeper now has some scores attached to it, showing you how its set of tags match up to the larger set of tags in the system. To make this really relevant, we first show how similar this content is to all other content, then we show how similar it is to content you’ve already marked as ‘accurate’. We expect this feature to grow and grow and the Swiftriver team think this will be a key metric in the battle to cut through the noise and get down to only really interesting content.

Content Refresh Message

The new version of Sweeper keeps you updated with messages about how many new pieces of content have been collected!

As always, we didn’t manage to cram everything into this release that we would have liked to – but no worries, it just makes the next release all the more juicier!

So before I go, I can give you a little taste of some of the things you can expect to see from the next release:

  •  Revised Sweeper Panel
    We have been doing some work internally on improving the user workflow around voting on content, expect the next release to have some jazzy new buttons for you all to press!
  •  Pluggable Content Ordering Module
    Building on some of the great statistical tools we have in Sweeper already, the next release will ship with a whole new plugin framework (in addition to our Parser and Turbine models) that will allow you to change the order in which content is delivered based on things like location, RiverID score, tag clustering etc. And because it’s pluggable, you can dream up your own wild and wonderful ways of sorting the quality from the noise!

Well that really is it for now. Thanks for following our project and please get in touch if you have any issues, suggestions or would like to contribute.

Thanks as always to the Swiftriver team and our brothers and sisters over at Ushahidi for making it all possible!

Matt

Director of Platform, Swiftriver

SwiftRiver Dataflow Infographic

I’m often asked about the architecture of the SwiftRiver platform. There’s been so much written about, talked about and presented to date that I thought I’d take a different approach.  So rather than bore you with another long blog post, I thought I’d share some visuals that explain the system.

PDF | Video | High-Res Image


If the images above are too small, try downloading the PDF version, High-res image or watch the video below.

Don’t forget to vote for us in the Knight Challenge!

Localizing News

The following post was written by a volunteer developer, Vladimir G. Ermakov a Master’s student at Carnegie-Mellon University in Pennsylvania. Over the past few months he took on an ambitious project: to contribute code that would allow us to parse news articles and attempt to auto-detect the primary location that is the subject of any given text.


Localizing News by Vladimir Ermakov

The amount of information available in electronic format is rapidly increasing. It is becoming possible to find out real-time about the current events in a particular part of the world based on electronic data such as news articles, blog entries, twitter feeds and SMS messages. Even though the data is available, there is an overwhelming amount of it and it is hard to stay on top of events that are of relevance. Getting informed about recent developments is particularly important in the times of crisis, when lives could depend on timely response. In this project I am exploring ways to pinpoint the location discussed in text documents. I am able to achieve good results by combining location keywords extracted by Yahoo! Placemaker service with state of the art machine learning and natural language processing techniques.

The basic approach that I’ve embarked upon is to extract location keywords from a document using Yahoo Placemaker service, and then apply classification techniques to disambiguate, which of these locations is most relevant to the document at hand. I’ve conducted experiments with Naïve Bayes and Fisher classifiers using bag of words model for feature extraction, but these did not give good results. I explored an alternative approach: use count and position of location keywords extracted by Placemaker and feed them into a SMV. This proved to be a very effective way of determining the country that is the focus of the document. Applying lemmatization to location adjectives such as Russian and converting them to nouns such as Russia helped improve the results even further.

While the Reuters-21578 is was a great dataset to use for training classifiers and experimenting with the data, the articles there were collected 20 years ago. What made this project interesting for me, is the possibility of visualizing the news around the world on a map, and seeing whether sudden rise in the number of articles published can be an indicator of some important events.

To make this possible I had to obtain a recent dataset. Reuters has archived articles from the last several years on their website. I developed a simple crawler that visited news articles from this archive, downloaded them to my server, and extracted the news article text content. I then passed this content off to the Yahoo Placemaker service, and output the data with the location labels into XML files. I then could use my scripts to run the experiments on this new dataset, just like I did with the original data.

I limited my data collection to the most recent articles. The archive contained over 400,000 news articles for 2010, which too many to download. I restricted the crawler to randomly pick 10% of the articles from each day of the year. This was still a significant amount of data, 80,000 articles, and fairly representative of the whole archive.

After all the experiments I was able to narrow down on a working solution for mapping news articles - extract location information from the article using Yahoo Placemaker service, making sure to lemmatize location adjectives, extract normalized count and position of location keywords within the article, and apply SVM classifier to decide which of these locations are more important to the article. The results were encouraging, and I believe this solution is ready to deploy into a real world application. I am hoping to implement an extension to Swiftriver platform in the near future that uses this method to classify news articles by country.


Valdimir’s paper is a much longer, and much more fascinating read than I could share here but if you’d like to read it. He can be reached by emailing vermakov [at] emu [dot] edu.

We’re working on folding this and other contributions into the next release.  Thanks for the awesome work Vladimir!  Other developers interested in contributing to the Swift platform can find out more here.

Sweeper v0.2.0 Released

It’s been a while since our last major release of product, this is partly because we’ve been working on big projects like the Queensland Floods and our project with Product (RED), and partly because of the holiday break.  However we’re back with a ton of goodies for Sweeper users in 2011.  

We’re happy to release the newest build of SwiftRiver:Sweeper today.  Here’s a rundown of features from Director of Platform, Matthew Griffiths.


Today we are releasing the latest version of our Sweeper app!

Sweeper 0.2 doesn’t bring a whole lot of new UI wizardry but under the hood we have been beavering away at cool addition after cool addition – plus the odd fix or two for things we didn’t get quite right last time.

In short, some of the wonderful magic you can expect to see in this new release are:

Improved pre-install checks in the Installer

We know that we had some issues with our installer last time out and we have been working hard to fix them for this release. Expect to see new checks for pre-requisites and improved checks around the requirements for the Kohana framework.

Yahoo Placemaker Turbine

By activating this impulse turbine the text of any content coming into Swiftriver can be sent to the ever popular Yahoo Placemaker Service and if they mention a recognizable location then the coordinates of that location can be added to content.

Ushahidi Report Push Turbine

The first – and arguably most important – reactor turbine to be release for the Sweeper app. With the release of this Reactor Turbine you can now twin your sweeper instance with your Ushahidi instance! Something we know a lot of you out there have been waiting for!!!

A ton of new Source Parsers

Allowing users to aggregate content directly into Sweeper from all of these new sources:

  • Eventful
  • Flickr
  • FrontlineSMS
  • Google News
  • Email
  • Meetup.com

For those of you out there who are interested in the development side of Sweeper – and Swiftriver as a whole – there has been a whole host of activity that isn’t covered above. Look away now if you are easily offended by techie speak!

We have completely remodeled the presence of Swiftriver on GitHub – those of you who watch the repo will have noticed it already.

We now have separate repos for the main open source component of Swiftriver (you can find them by going to the Ushahidi page on git hub [http://www.github/Ushahidi] and looking for Sweeper, SwiftMeme, SiLCC etc.

As a consequence of this, the old SwiftRiver repo now holds only the framework files and not the individual applications. This is also the reason for the change in version numbering Sweeper and SwiftMeme are at v0.2.0 while SwiftRiver core remains at v0.6.0.

There are some complexities about working with this new repository structure but for all those dev’s out there interested in contributing to the project I will be blogging a little later this month on how exactly to get started with any of the apps.

So, I think that about all for now. Have fun with Sweeper V0.2 and as always give us your feedback if you have any!

Download Sweeper v0.2.0

State of the River 2010

A look back at a year of commits to the SwiftRiver project.  

Last December I began working on SwiftRiver for Ushahidi. I wasn’t really until March (after two scrapped codebases) that the project really began to develop.  Our goal at the time was to ‘democratize access to the tools needed for making sense of information’. More practically worded, to create an open source platform for filtering and prioritizing data feeds from SMS, Twitter, Email and the Web. The code has completely changed a few times, but the mission remains the same.

We’re a small team, two people, one largely working part time and a third full-timer who just joined this month and a whole lot of support from our community.

So I thought it would be cool to take a look back at the code written this year, and how far it’s all come. We now have a basic (still very Beta) platform and several apps built on that platform.  However, there’s no better way to review the progress of an open source project than by looking at it’s code!

Using a number of tools made available by GitHub we can do just that…

SwiftRiver Commit Activity 2010

Above you can see the timeline of commit activity over 52 weeks (previous codebase not withstanding). Activity is cyclical, representing major milestones for us (initial work, a major rewrite by Matt and the ramp up to Beta).

SwiftRiver Commit Activity 2010

Here you can see the programming language breakdown of our repo. We write most of our apps in PHP. However a lot of product ‘logic’ occurs in our Python applications which power our various apps through APIs. The collection of the APIs is the SwiftRiver Platform, the PHP stuff just sits on top of that platform.

  • 46% PHP
  • 39% Javascript
  • 14% Python
  • 1% Ruby
  • These numbers are a bit skewed, though, because our repo now includes a number of PHP applications and versions of those same applications for developers working on various components.

    SwiftRiver Commit Activity 2010

    The orange above represents my own commits to the project. I often commit on behalf of community volunteers or staff so this is a bit off. I’d like to think I write that much code. =)

    Yellow represents Ivan Kavuma, who worked on the platform the most during this timeframe.

    SwiftRiver Commit Activity 2010

    Here Matt (Red) and I (Orange) took turns. My work was largely UI related while Matt rewrote the entire platform for architecture reasons back in May - that’s that massive hump on May 14th which was all new code!!!

    SwiftRiver Commit Activity 2010

    In September I made a massive commit, but it actually wasn’t that prolific at all. Basically I pulled in all our API ode bases and centralized them into the main repo. Likewise Matt made some major changes to the repo in September, centralizing all Apps in development, and releases into the main repo. This was all to showcase to interested parties that SwiftRiver is the platform, and not just our flagship app, Sweeper.

    In October Matt and new developer Ahmed Maawy began making massive commits as they began work on our new apps SwiftMeme and SwiftMail.

    Since then we’ve added a lot of new features related to location detection and new parsers for integrating various social media platforms.

    SwiftRiver Commit Activity 2010

    The chart above represents the time of day when most of our commits happen which seems to be between the hours of 2am and 4am on weekends!!  You can check out the code for yourself at https://github.com/ushahidi/swiftriver

    Vote in the Knight News Challenge

    Every year the Knight Foundation rewards innovation in technology primarily targeting professional and citizen journalists. The rewards are grants that help projects scale and improve their platforms.  We just entered and wanted to take some time to explain our vision and what we think makes us a worthy applicant.

    What is SwiftRiver’s mission?  To democratize access to tools that can be used to filter and make sense of realtime information from SMS, Twitter, Email and the Web.

    Where do we add value to news? SwiftRiver is free and open source. This includes apis for natural language processing, location detection, reputation & trust, duplication filtering and influence detection.

    We make these tools open for two reasons: Firstly, because in large news rooms, staff want complete control over their platforms and they need to be able to modify and customize workflows as needed.  This tends to mean they develop similar tools in-house which is great for organizations with those types of resources, not so great for organizations who can’t.  Secondly, our goal is to make these advanced intelligence tools available to journalists in even the most remote, unconnected places. 

    Who needs our products?  The strongest demand for SwiftRiver is actually from journalists who are increasingly overwhelmed by the task of sorting through vast streams of data.  We’re actually working with several different groups from around the world who want to use applications like Twitter and Facebook to gather news, who share the problem of identifying the kernels of reliable information amidst a sea of ‘noise’.

    Why should you vote for us? SwiftRiver has gone from merely a concept that was laid out two years ago, to a tangible product over the last year on very limited resources.  

    Although, we’re part of the Ushahidi family (still a small company in it’s own right), we don’t have access to the same financial resources or staff.  They all have their hands-full making Ushahidi the great product that it is.  Because we’re a small team, we can’t develop things as quickly as we might like.  Demand is way out-pacing our ability to deliver and scale.

    We’re a very small team: one full-time person people, one part-time developer and we’ve only this month added a third.  

    Who are you targeting? Swift is for people overwhelmed by data.  That’s a very broad problem that essentially effects everyone with a computer and connection to the internet.  This makes a singular audience difficult to suss out.  I like to say this: We built a platform and we’re using our platform to target different industries, primarily, data journalists.

    There are many other uses of the SwiftRiver platform, many that people are discovering without our guidance and hopefully that means what we’re doing is powerful, adaptable, relevant in different scenarios, easy to use and most importantly accesible to all.

    Vote or ask questions about SwiftRiver in the Knight News Challenge.

    (RED) Uses SwiftRiver for World Aids Day

    Wednesday was World AIDS Day, December 1st. In case you missed it, we unveiled a collaboration that had us working with (PRODUCT) RED and number of their partners to launch a campaign that allowed users of social media to participate in a global campaign to ways awareness of that fact that it’s possible that by 2015 child will be born with the AIDS virus.

    “With over 1000 children infected with HIV each day, over 90 per cent of which are in Sub-Saharan Africa, the goal of an AIDS Free Generation is one that we at (RED), along with the many organizations in the global health community, are committed to trying to make a reality”. “Since launching in 2006, (RED) has generated more than $160 million for the Global Fund to help finance AIDS programs that provide ARV treatment and help fund treatment to prevent the transmission of HIV from mother to child”.

    Turn the World (RED)

    The whole world has been devastated by HIV and AIDs over the past several decades, but in particular Africa is often seen as the continent most affected by it’s decimating affects. It’s really inspiring to realize how close we are to eradicating the virus from the planet (possibly in the span of one generation). So it was our great pleasure to have the opportunity to work with (PRODUCT) RED to assist them in their efforts and challenge us all to make 2015 our collective goal.

    The (RED) campaign is marks the most extensive use of Ushahidi’s SwiftRiver platform to date. SwiftRiver is an open source tool for aggregating and making sense of real-time data, something we’re working on to improve the ability of all organizations to crowd-source information from the public without becoming overwhelmed by the response.

    So using a new theme we’re working on for Ushahidi, SwiftRiver’s data aggregation/sorting tools and information from various social media portals here’s what we came up with:

    Pre-Launch

    We were able to pull real-time information from Facebook, Flickr, FourSquare, Eventful, Twitter, YouTube, Meetups.com and (RED) partners like Starbucks, GAP and Nike which allowed people to take actions online to show their for support their campaign: ‘The AIDS Free Generation is Due in 2015’. How does it all work?

    • On Foursquare, users can check-in to unlock a World Aids Day Badge.
    • On Facebook, users activate an app that changes their profile pictures.
    • On Twitter users can post messages with hashtag #turnred.
    • On Flickr users can upload photos with the tag ‘turnred’.
    • On YouTube watching this video marks your support as well.
    • Users real-world actions can also count by attending (RED) meetups

    Each user action helps turn their country and timezone a deeper shade of red on the map above. The culmination of user actions around the world would effectively turn the entire planet red - not to represent total awareness, but rather the support this initiative has from people all around the world. We’re happy to announce that this project will live on indefinitely throughout the year (hopefully into 2015), allowing users to continue showing their support far beyond Dec 1, 2010.

    So what was the result?

    World Turns (RED)!

    A (RED) world!

    This article also appears on the Ushahidi Blog, here.