Some Reflections on Sweeper from N.E.A.T. Nigeria

In April we were contacted by a group out of Georgia Tech, M.I.T. and student on the ground in Nigeria about the, then, upcoming elections. This group of individuals, together working as N.E.A.T. (the Nigerian Election Aggregation Team) wanted to run a campaign that mashed up data from several different Ushahidi deployments, Twitter and other sources, displaying them in their own Ushahidi deployment. They ended up writing a lot of custom code but this was the first ‘stress test’ of the SwiftRiver platform and our Sweeper application to date.

The following is a review of the N.E.A.T. team’s experiences with Sweeper. It was written by Thomas Smyth from Georgia Tech just after their election project was complete on May 2:

What Sweeper Did Well

  • Quick setup: Jon had our instance up in running in what seemed like a heartbeat. This was much appreciated.
  • Reliability: Sweeper stayed up pretty reliably as long as I didn’t break it!
  • Auto-Tagging: This feature was pretty neat and our system used Sweeper’s tags for meta-analysis.
  • Support: Matt was available consistently for in-depth help and scheming. We appreciated this.

Issues With Sweeper

  • Bugginess: Several major bugs were encountered, e.g. the duplication service. But this is to be expected for a young project.
  • Twitter lag: Twitter updates weren’t showing up for many minutes. Since Twitter was our main source of timely information, this was a big problem. We ended up implementing our own scraper using Twitter’s stream API, which has worked brilliantly. Matt and I have discussed this.
  • Searching: Sweeper currently doesn’t allow searching of reports, and this was a desired feature which we implemented. We also implemented a ‘saved search’ feature, which turned out to be quite useful. It allows the user to specify a search string (such as “guns or bombs or knives”) to be “tracked”. The system then searches all incoming reports and maintains a time series visualization. This allows a user to see what topics are ‘spiking’. Something like this would fit nicely in the the analytics panel in Sweeper.
  • Analytics panel: There are a few good things here but the interface could be a lot denser, so that more useful analytics could be added. For instance, top tags could be represented with a compact table rather than a bar chart. Charts should only be used in cases where the visual representation provides a clear benefit. Pie charts are usually unnecessary, etc.
  • Geolocation problems: The automatic geolocation service was quite dodgy. I didn’t do any actual counting but I’d say upwards of half the results were wrong. I think it’s a difficult thing to do automatically. So much ambiguity, etc. We ended up building a custom solution for geolocation, incorporating polling booth data (120k of them!) from INEC. The system could automatically recognize a polling unit code like 03/04/12/013 in a tweet, and translate that into a geolocation.
  • Scanning interface: The main interface of sweeper, where users quickly scan through reports and categorize them, could be more efficient. It’s not clear why each report needs to take up so much space, and why the interface doesn’t scale to fit the whole screen. The animations were also somewhat disorienting. In our system, we tried a system where users ‘checked out’ a batch of 10 reports and quickly scanned them in a compact table format, marking relevant ones with a checkbox. This seemed to work nicely, and didn’t require (I think) as many requests to the server. In general, I think Sweeper’s interface could be tightened a lot. Users are more likely to be experienced, frequent visitors, rather than occasional ones (I think). Therefore you can make it a little more efficient and specialized than a general purpose website. I think users would appreciate this. I’d be happy to consult further here if there is interest.
  • Code and documentation: Much of the functionality described above could perhaps have been added to Sweeper. However, we found it hard to get started on adding plugins. The codebase could be better organized so that it is clear where code for different components should go. The code itself could also be cleaner in places. Also, documentation needs to be available. But again, we realize Sweeper is a young project and these things are surely on the TODO list!

That’s all I have for now guys. Let me know if you have any questions. Many thanks for everything. Let’s keep talking!

This is great feedback and some of it we’ve already begin working on, while the rest (both the code and the suggestions) have been added to our roadmap.

(Photo from http://www.uiowa.edu)

Knight News Challenge Grant!

It’s truly an honor to accept a $250,000 grant from the Knight Foundation for the SwiftRiver project!  It’s the culmination of a long journey that began in 2008 but evolved in 2010 when I joined the project as (product designer) and later Matthew Griffiths (lead developer).

Swift is an open-source initiative who’s goal is to make the process of vetting information more efficient.  The project to date has progressed well thanks in no small part to the following people: Matthew Griffiths (so important to this project I mentioned him twice), Ahmed Maawy, Charl Van Neikerk, Heather Ford, Vladimir Ermakov, The Ushahidi team, Omidyar Networks, Chris Blow, Ed Bice, Kaushal Jhalla, Neville Newey, Edmar Ferreira, Pete Warden, Patrick Meier, Anahi Ayala, Ethan Zuckerman, the TED staff, Google’s 2010 Summer of Code Participants (Mang-Git, Soe, Nishith Rastogi), the Guardian’s Activate staff, Product(RED) and many others. This project would be nowhere without you all so thanks for making it happen.

For many of us, this project represents a new way of democratizing access to the tools for understanding and vetting information which is needed by Ushahidi, journalists, and many others.

Building a people’s newswire with Newsti.ps

[Guest blog post by Jenka Soderberg, a 2011 Knight Fellow at Stanford University and Evening News Director at KBOO Community Radio in Portland, Oregon. She can be reached at jenka [at] stanford [dot] edu]. This is a cross-post from the Ushahidi blog.

When I first started working on www.Indymedia.org in 2000, I was really excited about the platform it provided: a way for people who witnessed news events to immediately publish text, audio, video and photos to an OPEN newswire.  This was unprecedented on the web at that time, and led to an explosion of open multimedia content-posting sites.  Since its inception at the World Trade Organization protests in Seattle in 1999, the Independent Media Center expanded into over 200 local sites worldwide, all funneling featured content into the main (global) site www.indymedia.org.  In many ways, this could represent the way news organizations operate in the future – but most of the major news companies haven’t caught on to this trend just yet.

I got into the world of journalism because I didn’t trust the media.  Time and again, I’d read, hear or watch news stories that were grossly inaccurate, one-sided and oversimplified.  So I took seriously the slogan, “Don’t hate the media, BE the media”, and helped launch a bunch of indymedia centers and microradio stations all over the world, always with the hope of giving voice to the voiceless, allowing people to tell their own stories and to share in the narrative that was developing about them without the often-damaging involvement of advertising dollars and managing editors who presume to dumb things down for audiences they believe they have to entertain as well as inform.

Now, with more and more people turning away from traditional media to get their news online (see chart), it seems those audiences, about whom so many assumptions were made by the management of media corporations, are trying to find their own way in the new media world and find the news that they think is important and valuable.

Unfortunately, this often means that people seek out only news sources that confirm and uphold their existing points of view, and may be just as full of inaccuracies, speculation and oversimplification as the news media that they were trying to escape.

How can we get through the mess of misinformation to find the real tips of breaking news events, as they’re happening, and get this information out to as broad an audience as possible?

I’ve been working with a team at Stanford this year to use Ushahidi’s Swiftriver platform, and specifically Sweeper (one of the multiple tools in the Swiftriver toolbox) to try to extract real newstips from the deluge of 140-character texts and tweets, and try to figure out which newstips are real and accurate.  Our project description and current newswire is at www.newsti.ps

We’re implementing this in the Occupied Palestinian Territories, an area where many news incidents are under-reported in the US, and others are over-reported, giving US audiences a skewed perspective of the reality on the ground.  We’re using the Swiftriver platform to skim the web and twitter for keywords that are then filtered by keyword, location, reputation and duplication and organized into a database.  Our reporters in different parts of the Palestinian Territories (the West Bank, Gaza and Jerusalem), can follow up on the most poignant of these tips and verify their accuracy.  These reporters have created the International Middle East Media Center (www.imemc.org), currently the most widely-read English-language news site based in the Palestinian Territories.

We’re also working on a way to allow people who witness news events but don’t have the luxury of a smart phone yet (only 2% of cellphone users in the Palestinian Territories have smart phones, and 3G is extremely spotty), to send texts and photos directly into our system as well.  For translation of Arabic texts, we’ve solicited the help of the crowdsourced translation team of www.meedan.net.

Like with Indymedia, we think that this work can be an alternative to the mainstream media – although, as always, they are free to use these news stories, it seems unlikely that many will.  When news corporations are focused on selling advertising dollars instead of providing accurate news for their audiences, they will continue to go the way of the dinosaurs, as they are doing.  Unfortunately what we’re losing right now are lots of good, investigative news reporters who held politicians’ feet to the fire, reported on breaking news events and local issues, investigated wrongdoing by large companies, connected audience members with the stories of people in different circumstances far across the globe, but with whom they could relate due to the strength of the writing and storytelling.  What we’re left with right now, to a large extent, are cable news channels whose focus is on entertainment and advertising, and vitriolic talk radio that exuberantly embraces speculation, rumor and misinformation over fact-checked, accurate news reports.  On the local news front, AOL’s newest branchild, patch.com, threatens to replace real local reporting with half-hearted, badly-written reports that are unapologetically inaccurate.

Can we get a ‘people’s newswire’ based on eyewitness reports of newsworthy events?  I believe we can – if we combine the automation of systems like Swiftriver, the data visualization possibilities of tools like Ushahidi, and the insight of trained reporters who can follow up on potential leads.  Heck, if we can do it in the Palestinian Territories, then we can do it anywhere!

The video below is a short presentation about this project. Be sure to check out our website www.newsti.ps for real-time updates during the upcoming humanitarian flotilla to break the siege on the Gaza Strip.

Vote in the Knight News Challenge

Every year the Knight Foundation rewards innovation in technology primarily targeting professional and citizen journalists. The rewards are grants that help projects scale and improve their platforms.  We just entered and wanted to take some time to explain our vision and what we think makes us a worthy applicant.

What is SwiftRiver’s mission?  To democratize access to tools that can be used to filter and make sense of realtime information from SMS, Twitter, Email and the Web.

Where do we add value to news? SwiftRiver is free and open source. This includes apis for natural language processing, location detection, reputation & trust, duplication filtering and influence detection.

We make these tools open for two reasons: Firstly, because in large news rooms, staff want complete control over their platforms and they need to be able to modify and customize workflows as needed.  This tends to mean they develop similar tools in-house which is great for organizations with those types of resources, not so great for organizations who can’t.  Secondly, our goal is to make these advanced intelligence tools available to journalists in even the most remote, unconnected places. 

Who needs our products?  The strongest demand for SwiftRiver is actually from journalists who are increasingly overwhelmed by the task of sorting through vast streams of data.  We’re actually working with several different groups from around the world who want to use applications like Twitter and Facebook to gather news, who share the problem of identifying the kernels of reliable information amidst a sea of ‘noise’.

Why should you vote for us? SwiftRiver has gone from merely a concept that was laid out two years ago, to a tangible product over the last year on very limited resources.  

Although, we’re part of the Ushahidi family (still a small company in it’s own right), we don’t have access to the same financial resources or staff.  They all have their hands-full making Ushahidi the great product that it is.  Because we’re a small team, we can’t develop things as quickly as we might like.  Demand is way out-pacing our ability to deliver and scale.

We’re a very small team: one full-time person people, one part-time developer and we’ve only this month added a third.  

Who are you targeting? Swift is for people overwhelmed by data.  That’s a very broad problem that essentially effects everyone with a computer and connection to the internet.  This makes a singular audience difficult to suss out.  I like to say this: We built a platform and we’re using our platform to target different industries, primarily, data journalists.

There are many other uses of the SwiftRiver platform, many that people are discovering without our guidance and hopefully that means what we’re doing is powerful, adaptable, relevant in different scenarios, easy to use and most importantly accesible to all.

Vote or ask questions about SwiftRiver in the Knight News Challenge.