Crowdsourcing and Chaos Theory

This is a repost from the Ushahidi Blog. Below you’ll find the basis for my Ignite talk from ICCM10 in Boston originally titled “Veracity Blues: The Trouble with Crowdsourcing”.

varsity-blues-800-75

Would you trust these guys with your data?

Not based on appearance, but experience or lack thereof. It depends right? You might trust them to lead you to a Heisman Trophy, but you probably wouldn’t trust them to lead your disaster relief initiative. This is because expertise matters only when coupled with context. But there’s another analogy here, and that’s one of how organizations view crowdsourced information.

The people in the above image share a common expertise. They are a small team. They trust each others experience. They’ve been vetted and validated as being the best of their lot by a coach. They plan their operations before they fully understand the problem. Who else does this remind you of?

The military. This isn’t a weakness. It’s how the military *has* to respond. Advance insight is not always a possibility, no matter what the stimulus, the strategy has be able to adapt to deal with unknown variable. Like wise, when a football team is in a stadium, they don’t have the benefit of waiting for a lot of analysis. Decisions have to be made on the fly. Football teams *can’t* rely upon the crowd, nor can most military operations.

An Aversion to Crowdsourcery

My colleague Patrick Meier likes to refer the reluctance to considering ‘the crowd’ as a valid source as (sort of) an aversion to ‘crowdsourcery’. Crowdsourcing has been described as forbidden, occult and dangerous so the word crowdsourcey is a pun on this fear of crowdsourcing. The typical response is that the crowd can’t be trusted because individuals are not only unvetted, but untraceable. It would be worse than guessing, to let random strangers who may or may not be guessing, make decisions — at least, it would be to groups still hesitating about the effectiveness of crowdsourcing.

Furthermore, the organizations are experts and the crowd will be full of people making ill-informed decisions. Or worse, the untrained crowd might disagree with the assumptions response organizations had about them, risking credibility or funding from donors.

But the reality is that the crowd isn’t impossible to vet — although it is difficult — and there are aspects of crowdsourcing that can be used as a resource.

Football Vs. Philanthropy

football_play

Ironically, this actually means that most humanitarian organizations respond to aid or releif, just like football teams do — they design their ‘plays’, they develop worst case scenarios, they huddle together then they rush onto the field to their adoring fans!

But humanitarian organizations have the advantage of having the time to prepare for pre-conceived scenarios. They don’t make every single decision out on the field in front of the public, a lot of it is done in anticipation. They don’t necessarily operate blind, they measure almost everything that can be measured before executing operations. Crowdsourcing, is a bit like inviting that ‘blindness’ back in. Who will say what? How do we maintain the integrity of data? How do we know who to pay attention to? The danger is to ignore the crowd entirely, functioning as if it doesn’t exist as a source of information.

  • Traditional humanitarian programs, like football plays, assume a lot.
  • Philanthropy affects people’s lives so it’s worth exploring untried solutions.
  • There will always be more about a situation that you don’t know.
  • The crowd is willing to be a resource. They want to engage and be engaged.

Chaos Theory and Humanitarian Groups

This seems somewhat obvious. So why do humanitarian groups operate like football teams and military ops? Because they are trying to mitigate chaos.

In chaos theory, a deterministic system operates on the principle that a certain set of conditions at the outset of an experiment will yield an equally certain set of results. One plus two will always equal three. However, due to what’s called the butterfly effect, dynamical systems are actually highly sensitive to those early conditions. The most subtle variations, very even more as time passes, resulting in a yield of wildly unpredictable results. I’m of the belief that humanitarian operations reflect dynamical systems. Political variables, organization structure, the people selected to run programs….these are variables that, from the outset, affect everything that organization will try to do because they aren’t ‘constant’ they change over time in reaction to things changing around them. Most organizations try to contain ‘the butterfly effect’ by favoring deterministic methodologies….if we do these things correctly, we’ll get these results.

By introducing the crowd, then, most organizations react with fear. We’re deliberately reintroducing ‘chaos’ into their orderly systems, the chaos they work so hard at keeping out. Through what I call folksonomic triage, however, I propose we use the crowd itself as a buffer against that chaos.

Folksonomic Triage

Folksonomy - people defined

Triage - condition based decisions

This is a term that I made up because I feel it describes most systems designed to corroborate individuals based on evidence mined from the crowd. For example, if person 1 says “The sky is red” while the rest of the people in the crowd say the “The skye is blue” we’ll likely favor the crowd’s opinion over the individual who differs. The user in this case is misinformed or lying. The spam filter in Gmail is a practical example of this in action, if enough people mark messages containing the same characteristics as spam, Gmail looks for messages containing some or all of those conditions and in the future will make decisions based on what it’s learned is most likely to be spam.

If the same person says, “There is an earthquake in San Francisco, California. The city has been leveled!”, how do we know if he or she is telling the truth? We can try to deduce the obvious. We know California is prone to earthquakes, they happen quite frequently there. We also know San Fran is a city that’s monitored by a number of agencies who study earthquakes, but perhaps we haven’t heard from them yet. So it’s possible that this event has occurred but we still aren’t sure. So let’s look at the crowd, we know San Fran has a lot of people who, if they survived such a quake, would be sophisticated enough to help spread the news via a number of communication channels. A lack of similar information means either the crowd is unable to respond or that there is nothing occurring that they would need to respond to.

So what does the crowd say? Maybe there was indeed an earthquake, but the person in question exaggerated their claims. Maybe the person is in Australia….thousands of miles away from San Fran….and read news that was misinterpreted. Maybe there was no earth quake at all. Are there pics, video, other reports that corroborate the story? Making decisions like this, systematic reasoning, is called triage. Because we’re making these decision based on the actions or expressions of other people, the crowd, I call it folksonomic triage.

Sometimes there is no crowd present, maybe it’s just a handful of people who are offering useless, or deliberately misleading, information. Other systems would need to be put in place to try to determine truth. After all, without much of a crowd, it’s not really crowdsourcing. Also, I should say that there is always the possibility of the outlier, the person in the crowd who predicts an event (either based on nice information or chance) or who is simply on the radar of people monitoring a situation before everyone else begins corroborating his or her claims.

ICCM 2010: Veracity Blues

View more presentations from Ushahidi.

Better Living Through Crowdsourcing

crowdChristian Kreutz explores the many technologies the the world is using to make sense of real world data in the digital domain. These technologies, apart and collectively, enable computers to more accurately interpret the world as we understand it. In the hopes that they’ll be able to tell us more about our reality than we are able to infer unaided.

Our relationship with these technologies is self-reinforcing, it’s both driven by, and the cause of, an explosion of the ‘sharing’ of content. In other words, the more data we have, the more we want to understand and contextualize it. The more we understand, the greater the motivation to create and share even more.

The Information Age, Amplified

Eric Schmidt, CEO of Google, recently talked about just how fast humans are creating content:

Thanks to the Internet, we now double every two days all stored information. The estimated amount is 5 exabytes according to Eric Schmidt (Google) and it took human kind 2000 years to get a similar amount of archived information.

So how are machines able to parse all this data from the real-world? Well, there are a few ways…

  • Text Recognition and Natural Language Processing
  • Voice Recognition
  • Mobile Data Collection
  • Image Processing and Computer Vision

That’s a few, but also consider a number of other technologies, programs for mining the social graph, mapping, checking-in, active learning…too many to list. The point is, the sum of these parts allows for platforms that attempt to understand media as close to the way humans do as possible. Of course, the benefit of computing is that algorithms work faster and more efficiently than we do. Despite the number technologies listed above, artificial intelligence isn’t quite where it needs to be to completely automate managing it all.

Just today there were reports that Cuil, a search engine that relied upon semantic parsing algorithms to mine the dark web, might be shutting down. I’m sure their technology was sound and some of the brightest minds in the business started Cuil, but there are real difficulties in relying on machines to do complex tasks where context is the variable.

Crowdsource the Filter

Our approach is to address the problem from a different angle, where humans can distribute work to many, use machines to aggregate the output of that productivity, and then work with smart tools that learn from the users needs and expectations. If our code isn’t smart enough to make sense of data on it’s own (it’s not) but humans are (yet they aren’t as fast or organized), then perhaps part of the solution lies in optimizing human efforts at filtering content, adding context and using the result as the base for improving future algorithmic decisions. This is called active learning, where the interactions of a human operator improves algorithms assigned to perform certain functions.

My colleague Patrick Meier refers to this as Crowdsourcing the Filter. I think at least in the near term, this is the future of intelligent computing, where smart machines assist humans, helping to us to accomplish the tasks we need to accomplish more efficaciously.

At CrowdConf next month on October 4th, SwiftRiver will be onsite demonstrating some of the applications we’ve built from this understanding. This is part of our approach to solving the problem of ‘too much data’. We’ll let the big guys like Google, Microsoft and IBM figure out the secrets to scalable a.i. In the mean time, our goal at SwiftRiver is to democratize access to tools that help people make sense of data, on their terms.

Visualizing Redundant Data Validation

data visualization

The following visualizations represent the various methods that go into calculating the reputation and veracity scores for users and content within the SwiftRiver platform. They are in part a response to this comment from reader Charles Bernard on this post. His comment:

In many instances, there are entities with a vested interest in preventing valid information regarding things such as voting, battles and even disasters, both natural and man-made.

For nearly any human effort, there exist a group of entities which would profit by either the details or the extent of a problem being kept from the public–and that can include relief agencies.

While tracking particular sources and their validity of reports is a step in the right direction, some entities, in particular governments and large corporations have access to the resources needed to generate thousands or even 100,00s of thousands of false data reports, flooding the system with misinformation.


In other words, what steps are we taking to prevent individuals with malicious intent from gaming SwiftRiver? Here was my response:

With Swift, we aren’t just validating content, we’re also validating users, users validate each other and content validates users. Content can also be used to verify other content. This creates a system that’s difficult to dupe, as one looking to falsify information would need to thousands of false reports from a number of different ‘users’, locations, and media channels.

What would be absolutely possible is for a group to download Swift, set up their own instance with all sorts of fake information and publicize it as fact. However, our distributed, decentralized reputation system River ID would show that outside of that instances ‘ecosystem’ no one trusts those users, or the instance. If the administrators opt out of tracking…they also forfeit any sort of benefits that come from River ID (trust from users who don’t know you or your site). In this case falsifying information is indeed easy, but promoting it becomes self-defeating, as the more people who aren’t under your influence see it, the less authority your Swift instance (with all it’s fake reports) actually holds.


I thought these concepts might be hard to grasp so I made the following Arc Diagrams to give a visual representation of what I actually mean. Click the images for high-versions. In the images below, the light grey color is simply used to indicate that content isn’t important for what that particular chart is showing you.

voting

Fig. 1 Individual Voting Against the Community

Figure 1 represents the most classic scenario of ‘gaming’, spam, bots or human individuals who are trying to vote bogus content ‘up’ so it will be weighted higher than other content. Section “A” represents User 1. Section “B” represents the activity of User 2 (our spammer). Section “E” represents the community within this particular Swift instance. Section “F” represents the users of our distributed trust system River ID or the global SwiftRiver economy. Section “C” represents individual content items. Section “D” represents the source that content is coming from.

The thickness of the lines connecting the users to the content and the source, represents how they’ve voted on those particular things. The thickness of the line for User 2 tells us that he’s rating these things very highly. Perhaps they come from his blog, and he wants them at the top! The thickness of the lines from the local community of the SwiftRiver instance as well as the global users tells us that these content sources are suspect. We can see that User 1 (who represents our average, active user) is voting closer to the how the community is voting, in fact even harsher than the community votes both the content and the source (represented by thinner lines).

This dynamic relationship between users and their interactions with content (in contrast to the local and global community) is considered when scoring users, content, and the sources. In this case the person voting against the tide is actually damaging his or her own reputation both locally and globally. However, this isn’t the only thing we consider, otherwise it would encourage conformity which also isn’t good (sometimes the outlier knows something the rest don’t.)

voting

Fig. 2 Factors Considered in Rating Content

In Figure 2 we can see that things like Time, Location, Activeness as well as Global and Local interaction, are all considered. Time (green) and Location (dark grey) are optional, for scenarios like a conflict or war. The content producer’s location, or proximity to ‘ground zero’ tells the system to factor this in to its score. Also the length of time that content is produced after the initial event may also tell us a lot. Things like ‘time’ and ‘location’ are optional because if your Swift instance is tracking something like a political scandal, time and proximity may not actually add any value to authority calculations.

Purple represents how active Users 1 and 2 are. In and of itself how much someone uses a Swift instance is irrelevants. It could mean that they are an eager member providing valuable assistance, or it could mean they are attempting a brute force attack on the system similar to the Figure 1 scenario. However, when coupled with other factors, frequency of interaction is considered and can positively or negatively weight the score for a user.

voting

Fig. 3 Ratings Visible to Users

In Figure 3 I’m illustrating what information is visibly shared in the scenarios above. The trust the local community has for Users 1 and 2 is displayed. The trust the global RiverID system has for Users 1 and 2 is also displayed. Thus, the trust Users 1 and 2 should have for each other is inferred.




Swift’s strength is in multiple points of redundancy. All scores are calculated against a multitude of other factors which may or may not be independent to the local community. This allows users to build scores more organically than x=bad y=good. There are some probabilistic calculations as well as algorithmic intricacies that make all this a lot more complex (a lot of math beyond my paygrade). We also calculate things like tags and content influence which compound the complexity.

Unless the local Swift instance administrators opt-in to participating in the global Swift ecosystem, their instance only holds authority with the people using it. In theory, their ‘gaming’ would then be contained to their local Swift instance. The fact that global authority isn’t considered would be an indicator that the public shouldn’t trust it. If they do opt-in to the global ecosystem, it becomes increasingly harder to continue gaming the system, as your scores are constantly weighted against the global community’s.

Because Swift is open source, it’s easy to reverse engineer or hack parts of the local system. But this is why we announced Swift Web Services last month, core components to the global system are centralized and well protected. This protects the global ecosystem, but still allows for independent uses of SwiftRiver, and all of it’s components as open, locally deployable apps. Some users, for example election monitors, may not want their SwiftRiver instance online at all. In that case, global authority doesn’t matter, the instance can and should only be influential amongst the people using it. This is why we opted for cloud solutions in addition to local deployment options, yet another redundancy to ensure the platform’s usefulness in multiple scenarios.

Post any follow up questions to the newsgroup or in the comments below.

Taxonomy for Text Messages

Getting crowd-sourced information into a system is only the first hurdle, the next is managing it

Getting crowd-sourced information into a system is only the first hurdle, the next is managing it. Last week we announced Swift Web Services, RESTful applications hosted in the cloud, that any third-party application or developer can use to assist in managing data. One of those services is SiLCC, a semantic tag extraction service for parsing text and extracting relevant keywords from Tweets and Text Messages. Tags like the names of people and places, actions that need to be taken or locations where things have occurred. It’s is an open service that we host on our servers, meaning anyone can use it in their applications. It will work with Word Press, Drupal, Frontline SMS, other aggregators like Managing News and more.

These other applications would send the SiLCC api a feed of content they want tagged, it then extracts keywords and returns a feed of tags linked to the content they refer to. From there they go on to be used however the original app developers decide.

tufts

For many organizations, this is a critical time saver. It saves humans the time from having to comb through a system to find useful content. Aggregating content in an Ushahidi instance that uses SiLCC or in SwiftRiver would allow bypass that manual sorting, allowing users to focus on verifying reports and responding to urgent requests.

Tags are the first, autonomous layer of taxonomy for content. They won’t be the only layer, but if you’re monitoring 100 different mobile phones sending in messages referring to volcanic eruption in Iceland, but you’re looking for the ten that reference one particular cancelled flight, this is one of the quickest ways to couple disparate items.

280 Characters or Less

A number of services are out there that offer similar functionality, in fact we recently partnered with Thomson Reuters who offers a service called Open Calais which extracts semantic keywords from articles and blogs. Where Open Calais doesn’t work so well is with shorter messages that are less than a paragraph in length. For managing information from mobile phone users, this is a problem because that content falls well below the threshold of Open Calais. So our partnership allows their service to supplement ours and vice-versa.

Active Mobile

SiLCC does one thing in particular differently than many apps out there that might be similar. Rather than exist as service that has to be improved by the developers (us) we’ve incorporated active learning techniques that allow it to learn autonomously. This is because we don’t know where or when the next crisis that needs to be monitored will occur. We don’t know who will set up the next SwiftRiver instance or what they’ll use it for. So we designed SiLCC to adapt to any and all scenarios by learning from the instance of use, rather than the top-down approach of tweaking the app on demand. This is known as persistent tagging. SiLCC auto-tags content, but also self-improves and accumulates knowledge (rather, conditions that it can use to improve future decisions).

Natural language processing geeks will wonder if they can define their own corpora and add words specific to their organization or event directly to SiLCC? Of course, this saves time and also improves performance. Additionally, by default we’ve included corpora for dealing with Twitter ontology as well as the TXTSPK (text speak) commonly used by mobile phone users.

Secret Ontology

Finally, the fact that we can predefine corpora, gives organizations the option of setting up codes for people utilize the system remotely. For instance, we could customize an Ushahidi instance to automatically verify and map any text message that contains a unique string (example “Help trapped in Port-au-prince Market #a1u9”). That tailing string of alphanumeric characters is like a password that tells the system to do something. An organization could set up these unique character strings and functions, giving them only to people they send to the field. In the event of an emergency, that person could communicate with HQ in ways that the other users of the system couldn’t. We have other apps for auto-detecting location, which makes it simple to extract that data as well. Rather than take a laptop into the field to map data, an organization could set up a specific set of keywords that represent locations or events. Then workers, armed only with phones with SMS functionality could use the system remotely.

This isn’t why we designed the app, and I doubt many orgs will use it this way, but I think it makes for an interesting possible extension of the Ushahidi platform. A more common use will probably be differentiation between actionable (someone needs something done now) and non-actionable reports (nothing needs to be done) for emergency response organizations.




We announced our alpha of SiLCC last week. If you’re interested in applying to be an alpha tester, click here. SiLCC is open source, so if you’d like to contribute to the project as a developer, follow the project on GitHub.