Crowdsourcing and Chaos Theory

This is a repost from the Ushahidi Blog. Below you’ll find the basis for my Ignite talk from ICCM10 in Boston originally titled “Veracity Blues: The Trouble with Crowdsourcing”.

varsity-blues-800-75

Would you trust these guys with your data?

Not based on appearance, but experience or lack thereof. It depends right? You might trust them to lead you to a Heisman Trophy, but you probably wouldn’t trust them to lead your disaster relief initiative. This is because expertise matters only when coupled with context. But there’s another analogy here, and that’s one of how organizations view crowdsourced information.

The people in the above image share a common expertise. They are a small team. They trust each others experience. They’ve been vetted and validated as being the best of their lot by a coach. They plan their operations before they fully understand the problem. Who else does this remind you of?

The military. This isn’t a weakness. It’s how the military *has* to respond. Advance insight is not always a possibility, no matter what the stimulus, the strategy has be able to adapt to deal with unknown variable. Like wise, when a football team is in a stadium, they don’t have the benefit of waiting for a lot of analysis. Decisions have to be made on the fly. Football teams *can’t* rely upon the crowd, nor can most military operations.

An Aversion to Crowdsourcery

My colleague Patrick Meier likes to refer the reluctance to considering ‘the crowd’ as a valid source as (sort of) an aversion to ‘crowdsourcery’. Crowdsourcing has been described as forbidden, occult and dangerous so the word crowdsourcey is a pun on this fear of crowdsourcing. The typical response is that the crowd can’t be trusted because individuals are not only unvetted, but untraceable. It would be worse than guessing, to let random strangers who may or may not be guessing, make decisions — at least, it would be to groups still hesitating about the effectiveness of crowdsourcing.

Furthermore, the organizations are experts and the crowd will be full of people making ill-informed decisions. Or worse, the untrained crowd might disagree with the assumptions response organizations had about them, risking credibility or funding from donors.

But the reality is that the crowd isn’t impossible to vet — although it is difficult — and there are aspects of crowdsourcing that can be used as a resource.

Football Vs. Philanthropy

football_play

Ironically, this actually means that most humanitarian organizations respond to aid or releif, just like football teams do — they design their ‘plays’, they develop worst case scenarios, they huddle together then they rush onto the field to their adoring fans!

But humanitarian organizations have the advantage of having the time to prepare for pre-conceived scenarios. They don’t make every single decision out on the field in front of the public, a lot of it is done in anticipation. They don’t necessarily operate blind, they measure almost everything that can be measured before executing operations. Crowdsourcing, is a bit like inviting that ‘blindness’ back in. Who will say what? How do we maintain the integrity of data? How do we know who to pay attention to? The danger is to ignore the crowd entirely, functioning as if it doesn’t exist as a source of information.

  • Traditional humanitarian programs, like football plays, assume a lot.
  • Philanthropy affects people’s lives so it’s worth exploring untried solutions.
  • There will always be more about a situation that you don’t know.
  • The crowd is willing to be a resource. They want to engage and be engaged.

Chaos Theory and Humanitarian Groups

This seems somewhat obvious. So why do humanitarian groups operate like football teams and military ops? Because they are trying to mitigate chaos.

In chaos theory, a deterministic system operates on the principle that a certain set of conditions at the outset of an experiment will yield an equally certain set of results. One plus two will always equal three. However, due to what’s called the butterfly effect, dynamical systems are actually highly sensitive to those early conditions. The most subtle variations, very even more as time passes, resulting in a yield of wildly unpredictable results. I’m of the belief that humanitarian operations reflect dynamical systems. Political variables, organization structure, the people selected to run programs….these are variables that, from the outset, affect everything that organization will try to do because they aren’t ‘constant’ they change over time in reaction to things changing around them. Most organizations try to contain ‘the butterfly effect’ by favoring deterministic methodologies….if we do these things correctly, we’ll get these results.

By introducing the crowd, then, most organizations react with fear. We’re deliberately reintroducing ‘chaos’ into their orderly systems, the chaos they work so hard at keeping out. Through what I call folksonomic triage, however, I propose we use the crowd itself as a buffer against that chaos.

Folksonomic Triage

Folksonomy - people defined

Triage - condition based decisions

This is a term that I made up because I feel it describes most systems designed to corroborate individuals based on evidence mined from the crowd. For example, if person 1 says “The sky is red” while the rest of the people in the crowd say the “The skye is blue” we’ll likely favor the crowd’s opinion over the individual who differs. The user in this case is misinformed or lying. The spam filter in Gmail is a practical example of this in action, if enough people mark messages containing the same characteristics as spam, Gmail looks for messages containing some or all of those conditions and in the future will make decisions based on what it’s learned is most likely to be spam.

If the same person says, “There is an earthquake in San Francisco, California. The city has been leveled!”, how do we know if he or she is telling the truth? We can try to deduce the obvious. We know California is prone to earthquakes, they happen quite frequently there. We also know San Fran is a city that’s monitored by a number of agencies who study earthquakes, but perhaps we haven’t heard from them yet. So it’s possible that this event has occurred but we still aren’t sure. So let’s look at the crowd, we know San Fran has a lot of people who, if they survived such a quake, would be sophisticated enough to help spread the news via a number of communication channels. A lack of similar information means either the crowd is unable to respond or that there is nothing occurring that they would need to respond to.

So what does the crowd say? Maybe there was indeed an earthquake, but the person in question exaggerated their claims. Maybe the person is in Australia….thousands of miles away from San Fran….and read news that was misinterpreted. Maybe there was no earth quake at all. Are there pics, video, other reports that corroborate the story? Making decisions like this, systematic reasoning, is called triage. Because we’re making these decision based on the actions or expressions of other people, the crowd, I call it folksonomic triage.

Sometimes there is no crowd present, maybe it’s just a handful of people who are offering useless, or deliberately misleading, information. Other systems would need to be put in place to try to determine truth. After all, without much of a crowd, it’s not really crowdsourcing. Also, I should say that there is always the possibility of the outlier, the person in the crowd who predicts an event (either based on nice information or chance) or who is simply on the radar of people monitoring a situation before everyone else begins corroborating his or her claims.

ICCM 2010: Veracity Blues

View more presentations from Ushahidi.

Better Living Through Crowdsourcing

crowdChristian Kreutz explores the many technologies the the world is using to make sense of real world data in the digital domain. These technologies, apart and collectively, enable computers to more accurately interpret the world as we understand it. In the hopes that they’ll be able to tell us more about our reality than we are able to infer unaided.

Our relationship with these technologies is self-reinforcing, it’s both driven by, and the cause of, an explosion of the ‘sharing’ of content. In other words, the more data we have, the more we want to understand and contextualize it. The more we understand, the greater the motivation to create and share even more.

The Information Age, Amplified

Eric Schmidt, CEO of Google, recently talked about just how fast humans are creating content:

Thanks to the Internet, we now double every two days all stored information. The estimated amount is 5 exabytes according to Eric Schmidt (Google) and it took human kind 2000 years to get a similar amount of archived information.

So how are machines able to parse all this data from the real-world? Well, there are a few ways…

  • Text Recognition and Natural Language Processing
  • Voice Recognition
  • Mobile Data Collection
  • Image Processing and Computer Vision

That’s a few, but also consider a number of other technologies, programs for mining the social graph, mapping, checking-in, active learning…too many to list. The point is, the sum of these parts allows for platforms that attempt to understand media as close to the way humans do as possible. Of course, the benefit of computing is that algorithms work faster and more efficiently than we do. Despite the number technologies listed above, artificial intelligence isn’t quite where it needs to be to completely automate managing it all.

Just today there were reports that Cuil, a search engine that relied upon semantic parsing algorithms to mine the dark web, might be shutting down. I’m sure their technology was sound and some of the brightest minds in the business started Cuil, but there are real difficulties in relying on machines to do complex tasks where context is the variable.

Crowdsource the Filter

Our approach is to address the problem from a different angle, where humans can distribute work to many, use machines to aggregate the output of that productivity, and then work with smart tools that learn from the users needs and expectations. If our code isn’t smart enough to make sense of data on it’s own (it’s not) but humans are (yet they aren’t as fast or organized), then perhaps part of the solution lies in optimizing human efforts at filtering content, adding context and using the result as the base for improving future algorithmic decisions. This is called active learning, where the interactions of a human operator improves algorithms assigned to perform certain functions.

My colleague Patrick Meier refers to this as Crowdsourcing the Filter. I think at least in the near term, this is the future of intelligent computing, where smart machines assist humans, helping to us to accomplish the tasks we need to accomplish more efficaciously.

At CrowdConf next month on October 4th, SwiftRiver will be onsite demonstrating some of the applications we’ve built from this understanding. This is part of our approach to solving the problem of ‘too much data’. We’ll let the big guys like Google, Microsoft and IBM figure out the secrets to scalable a.i. In the mean time, our goal at SwiftRiver is to democratize access to tools that help people make sense of data, on their terms.