Subjectivity, Veracity and Truth

SwiftRiver is constructed from the viewpoint that there are no absolute truths and that what is considered to be factual by most is still highly subjective or biased depending upon context.

Thus, we build tools which allow users to curate their own depiction of a perspective. In the same way that there are more than one newspaper, more than one political party in most countries, more than one religion, even more than one ‘official’ source for occurrences like earthquakes or climate change. We build tools that enable people to convey their confidence in datasets. This in no way implies that data is unbiased.

Swift apps add many layers of context to data as meta-data for making processing data faster, which in turn give our own systems more ammowith which to attempt to understand and auto-mate it’s processing.

What does veracity mean?

Veracity is simply the term we use to represent the baseline of trust that our users have conveyed about content, sources and events.  This baseline allows us to do things like recommend related content that is likely be relevant to their view; or in the case of an organization, the collective view.

A number of factors go into creating this profile: how content is organized, how it’s interacted with, how people have behaved in the past, how certain communities feel about it’s members and vice-versa, calculations for real-world phenomena like time and location of an event and so on.

When it comes to verifying data, our tools serve two purposes.

  1. For a public-facing deployment of our apps (including Ushahidi), we offer tools that allow the user to make a case to the public about a particular view. For example, these are the people in the crowd whom they trust, and what those people had to say about an event.
  2. For a non-public facing deployment, some of our apps (like Sweeper) can be used to structure data, conditionally filter and view it. This is useful for setting up automated workflows like ‘pass only approved content, taged with location and these tags, and pass that data over to Ushahdi or some other application’.

In both cases the users, the people behind the deployment, are creating their unique baseline for trust, and therefore are putting forth what they consider to be accurate, or favored, content.

Isn’t this bias?

Yes. Any system operated by a human, and I would go further to say machine created by a human, is subject to some sort of bias.

What does it mean to verify data

In the context of most Ushahidi applications ‘verified’ means corroborated or confirmed by a human. This means on the receiving end, the person ‘verifying’ the data is essentially saying “I’m taking the onus to approve this report because something, or someone, has indicated that this is true.”

Does that mean untrue information can be ‘verified’ either intentionally or accidentally? Yes. All of the terms are highly subjective and people have a number of preconceptions about what these things mean. They are mere abstractions that represent user behavior and intended use.

Verification Levels

It is a mistake to assume that because something is ‘verified’ or has a high veracity score, that it is a fact. What these indications are actually telling viewers and/or the deployer is that this is the baseline for accuracy set forth by an editing body (the deployer). Even verifying reports multiple times by independent participants will not account for human bias or fallibility.

The numbers are there because they are an additional layer of context (readable by machines and humans) allowing the deployer(s) to curate information based on the trust profile they’ve set forth through their interactions.

Likewise, viewing the same data geo-spatially simply implies that this is one community’s understanding of the collected data and what they perceive it to represent. It’s simply a faster way to view data to build up a baseline of ‘favor’ and then use the scores to filter out the content is less likely to fit that profile.

- Jon Gosier, Director of Product

Localizing News

The following post was written by a volunteer developer, Vladimir G. Ermakov a Master’s student at Carnegie-Mellon University in Pennsylvania. Over the past few months he took on an ambitious project: to contribute code that would allow us to parse news articles and attempt to auto-detect the primary location that is the subject of any given text.


Localizing News by Vladimir Ermakov

The amount of information available in electronic format is rapidly increasing. It is becoming possible to find out real-time about the current events in a particular part of the world based on electronic data such as news articles, blog entries, twitter feeds and SMS messages. Even though the data is available, there is an overwhelming amount of it and it is hard to stay on top of events that are of relevance. Getting informed about recent developments is particularly important in the times of crisis, when lives could depend on timely response. In this project I am exploring ways to pinpoint the location discussed in text documents. I am able to achieve good results by combining location keywords extracted by Yahoo! Placemaker service with state of the art machine learning and natural language processing techniques.

The basic approach that I’ve embarked upon is to extract location keywords from a document using Yahoo Placemaker service, and then apply classification techniques to disambiguate, which of these locations is most relevant to the document at hand. I’ve conducted experiments with Naïve Bayes and Fisher classifiers using bag of words model for feature extraction, but these did not give good results. I explored an alternative approach: use count and position of location keywords extracted by Placemaker and feed them into a SMV. This proved to be a very effective way of determining the country that is the focus of the document. Applying lemmatization to location adjectives such as Russian and converting them to nouns such as Russia helped improve the results even further.

While the Reuters-21578 is was a great dataset to use for training classifiers and experimenting with the data, the articles there were collected 20 years ago. What made this project interesting for me, is the possibility of visualizing the news around the world on a map, and seeing whether sudden rise in the number of articles published can be an indicator of some important events.

To make this possible I had to obtain a recent dataset. Reuters has archived articles from the last several years on their website. I developed a simple crawler that visited news articles from this archive, downloaded them to my server, and extracted the news article text content. I then passed this content off to the Yahoo Placemaker service, and output the data with the location labels into XML files. I then could use my scripts to run the experiments on this new dataset, just like I did with the original data.

I limited my data collection to the most recent articles. The archive contained over 400,000 news articles for 2010, which too many to download. I restricted the crawler to randomly pick 10% of the articles from each day of the year. This was still a significant amount of data, 80,000 articles, and fairly representative of the whole archive.

After all the experiments I was able to narrow down on a working solution for mapping news articles - extract location information from the article using Yahoo Placemaker service, making sure to lemmatize location adjectives, extract normalized count and position of location keywords within the article, and apply SVM classifier to decide which of these locations are more important to the article. The results were encouraging, and I believe this solution is ready to deploy into a real world application. I am hoping to implement an extension to Swiftriver platform in the near future that uses this method to classify news articles by country.


Valdimir’s paper is a much longer, and much more fascinating read than I could share here but if you’d like to read it. He can be reached by emailing vermakov [at] emu [dot] edu.

We’re working on folding this and other contributions into the next release.  Thanks for the awesome work Vladimir!  Other developers interested in contributing to the Swift platform can find out more here.

WP-Veracity v1.5 Released

WP-Veracity is a plugin we built for Wordpress bloggers.  It’s not powered by any of our APIs like our other plugin WP-SiLCC, just something we wanted to see done.  Essentially it rates your WordPress posts by applying bayesian algorithms to post popularity and influence, with an adjustment for time so that the titles that rise to the top are always fresh.  We designed for blogs and news sites that produce large volumes of content, daily.

The triage of popularity (views), influence (clicks) and time (freshness), with some algorithmic weighting should allow you to display the most interesting content on your blog, rather than simply the most popular. This plug-in is a mashup of Popularity Contest from Crowd Favorite (http://crowdfavorite.com/wordpress/plugins/popularity-contest/) and Bayesian Top Title Learner (http://wordpress.org/extend/plugins/bayesian-top-title-learner/).

You can find it on Wordpress or GitHub

Crowdsourcing and Chaos Theory

This is a repost from the Ushahidi Blog. Below you’ll find the basis for my Ignite talk from ICCM10 in Boston originally titled “Veracity Blues: The Trouble with Crowdsourcing”.

varsity-blues-800-75

Would you trust these guys with your data?

Not based on appearance, but experience or lack thereof. It depends right? You might trust them to lead you to a Heisman Trophy, but you probably wouldn’t trust them to lead your disaster relief initiative. This is because expertise matters only when coupled with context. But there’s another analogy here, and that’s one of how organizations view crowdsourced information.

The people in the above image share a common expertise. They are a small team. They trust each others experience. They’ve been vetted and validated as being the best of their lot by a coach. They plan their operations before they fully understand the problem. Who else does this remind you of?

The military. This isn’t a weakness. It’s how the military *has* to respond. Advance insight is not always a possibility, no matter what the stimulus, the strategy has be able to adapt to deal with unknown variable. Like wise, when a football team is in a stadium, they don’t have the benefit of waiting for a lot of analysis. Decisions have to be made on the fly. Football teams *can’t* rely upon the crowd, nor can most military operations.

An Aversion to Crowdsourcery

My colleague Patrick Meier likes to refer the reluctance to considering ‘the crowd’ as a valid source as (sort of) an aversion to ‘crowdsourcery’. Crowdsourcing has been described as forbidden, occult and dangerous so the word crowdsourcey is a pun on this fear of crowdsourcing. The typical response is that the crowd can’t be trusted because individuals are not only unvetted, but untraceable. It would be worse than guessing, to let random strangers who may or may not be guessing, make decisions — at least, it would be to groups still hesitating about the effectiveness of crowdsourcing.

Furthermore, the organizations are experts and the crowd will be full of people making ill-informed decisions. Or worse, the untrained crowd might disagree with the assumptions response organizations had about them, risking credibility or funding from donors.

But the reality is that the crowd isn’t impossible to vet — although it is difficult — and there are aspects of crowdsourcing that can be used as a resource.

Football Vs. Philanthropy

football_play

Ironically, this actually means that most humanitarian organizations respond to aid or releif, just like football teams do — they design their ‘plays’, they develop worst case scenarios, they huddle together then they rush onto the field to their adoring fans!

But humanitarian organizations have the advantage of having the time to prepare for pre-conceived scenarios. They don’t make every single decision out on the field in front of the public, a lot of it is done in anticipation. They don’t necessarily operate blind, they measure almost everything that can be measured before executing operations. Crowdsourcing, is a bit like inviting that ‘blindness’ back in. Who will say what? How do we maintain the integrity of data? How do we know who to pay attention to? The danger is to ignore the crowd entirely, functioning as if it doesn’t exist as a source of information.

  • Traditional humanitarian programs, like football plays, assume a lot.
  • Philanthropy affects people’s lives so it’s worth exploring untried solutions.
  • There will always be more about a situation that you don’t know.
  • The crowd is willing to be a resource. They want to engage and be engaged.

Chaos Theory and Humanitarian Groups

This seems somewhat obvious. So why do humanitarian groups operate like football teams and military ops? Because they are trying to mitigate chaos.

In chaos theory, a deterministic system operates on the principle that a certain set of conditions at the outset of an experiment will yield an equally certain set of results. One plus two will always equal three. However, due to what’s called the butterfly effect, dynamical systems are actually highly sensitive to those early conditions. The most subtle variations, very even more as time passes, resulting in a yield of wildly unpredictable results. I’m of the belief that humanitarian operations reflect dynamical systems. Political variables, organization structure, the people selected to run programs….these are variables that, from the outset, affect everything that organization will try to do because they aren’t ‘constant’ they change over time in reaction to things changing around them. Most organizations try to contain ‘the butterfly effect’ by favoring deterministic methodologies….if we do these things correctly, we’ll get these results.

By introducing the crowd, then, most organizations react with fear. We’re deliberately reintroducing ‘chaos’ into their orderly systems, the chaos they work so hard at keeping out. Through what I call folksonomic triage, however, I propose we use the crowd itself as a buffer against that chaos.

Folksonomic Triage

Folksonomy - people defined

Triage - condition based decisions

This is a term that I made up because I feel it describes most systems designed to corroborate individuals based on evidence mined from the crowd. For example, if person 1 says “The sky is red” while the rest of the people in the crowd say the “The skye is blue” we’ll likely favor the crowd’s opinion over the individual who differs. The user in this case is misinformed or lying. The spam filter in Gmail is a practical example of this in action, if enough people mark messages containing the same characteristics as spam, Gmail looks for messages containing some or all of those conditions and in the future will make decisions based on what it’s learned is most likely to be spam.

If the same person says, “There is an earthquake in San Francisco, California. The city has been leveled!”, how do we know if he or she is telling the truth? We can try to deduce the obvious. We know California is prone to earthquakes, they happen quite frequently there. We also know San Fran is a city that’s monitored by a number of agencies who study earthquakes, but perhaps we haven’t heard from them yet. So it’s possible that this event has occurred but we still aren’t sure. So let’s look at the crowd, we know San Fran has a lot of people who, if they survived such a quake, would be sophisticated enough to help spread the news via a number of communication channels. A lack of similar information means either the crowd is unable to respond or that there is nothing occurring that they would need to respond to.

So what does the crowd say? Maybe there was indeed an earthquake, but the person in question exaggerated their claims. Maybe the person is in Australia….thousands of miles away from San Fran….and read news that was misinterpreted. Maybe there was no earth quake at all. Are there pics, video, other reports that corroborate the story? Making decisions like this, systematic reasoning, is called triage. Because we’re making these decision based on the actions or expressions of other people, the crowd, I call it folksonomic triage.

Sometimes there is no crowd present, maybe it’s just a handful of people who are offering useless, or deliberately misleading, information. Other systems would need to be put in place to try to determine truth. After all, without much of a crowd, it’s not really crowdsourcing. Also, I should say that there is always the possibility of the outlier, the person in the crowd who predicts an event (either based on nice information or chance) or who is simply on the radar of people monitoring a situation before everyone else begins corroborating his or her claims.

ICCM 2010: Veracity Blues

View more presentations from Ushahidi.

SwiftRiver Releases Plugins for Wordpress



For all you Wordpress publishers out there interested in SwiftRiver there are two official plugins we’re releasing today that bring Swift to your platform of choice: WP-SiLCC and WP-Veracity.

WP-SiLCC



WP-SiLCC is an auto tagging plug-in. Users who run news sites or aggregators should consider using this to add a basic level of taxonomy to all posts. WP-SiLCC also allows users to tag their own posts for sites that prefer a more folksonomic approach. WP-SiLCC uses active learning techniques to improve how it parses text over time.

Download WP-SiLCC from Wordpress.org

WP-Veracity



WP-Veracity applies bayesian algorithms to your content to help surface posts based on “interestingness”, influence and time-published rather than popularity alone. From SwiftRiver’s perspective, popularity is only an indicator of influence, not necessarily an indicator of authority. This plug-in calculates popularity (number of hits, trackbacks, comments), a bayes score and time (older content falls off organically) to offer a better picture of the most interesting posts on your blog at any given time.

Download WP-Veracity from Wordpress.org




For developers interested in creating their own plugins using Swift Web Services, visit our documentation wiki.

Asking Questions, Verifying Answers

vark.com

Sean Conner recently asked a great question about integrating a Question and Answer service like Aardvark or Yahoo Answers into Swiftriver. Here is our approach at Team Swift

In a Swift instance, Aardvark could be used as an additional ‘channel’ of input. Existing channels are Twitter, Email, SMS, News, RSS (any RSS feed), and Other (the catch-all for items coming in via our API). The only thing the Swift app wants to do is receive content, allow users and our algorithms to tag that content, and based on user behavior it scores the originating content source.

As an example for Aardvark: Johhny asks the question “Did an earthquake really happen in Chile?” on Vark.com on Feb 28th, only a day after the quake actually occurs. Robert responds on Vark with “No, at least I haven’t heard of one.” Vark user Jeremy responds with “Actually, Yes. An 8.8 magnitude earthquake occurred in Chile on Feb 27th.” In Swift, the answer and the accuracy of that answer is more important to us than the actual question (which just provides context).

To integrate Aardvark in Swift we’d probably write a module using their API that aggregates Answers with the corresponding Question as the ‘description’. Example of how that data would post to the Swiftriver API:

Title: “Actually, Yes. An 8.8 magnitude earthquake occurred in Chile on Feb 27th.”
Description: “Did an earthquake really happen in Chile? - Johnny”
Time: 17:08 EST
Date: Feb 30, 2010
Source: Jeremy’s user id on Vark.com
Channel: Vark API
Lat: 10.31
Lon: 01.40
Tags: 8.8, earthquake, chile

Title: “No, at least I haven’t heard of one.”
Description: “Did an earthquake really happen in Chile? - Johnny”
Time: 03:10 EST
Date: Feb 30, 2010
Source: Robert’s user id on Vark.com
Channel: Vark API
Lat: 10.31
Lon: 01.40
Tags: heard, chile, earthquake


Within Swift this is the primary information we need to verify information. Users with a careful eye will notice that we’ve included location data that Vark probably may or may not provide. We can easily extract that info from the hosted service SULSa. Here, the source is what we’re scoring. The channel is just an indicator for the user about where the content is coming from. That said, the source is not Vark itself, nor is it the user’s answer on Vark, but rather the user id on Vark.

Thus, if Robert keeps giving inaccurate answers, he maintains a very low score in Swift while Jeremy is viewed as the more trusted authority. Now this approach assumes that Vark.com offers an API that allows for this type of data aggregation which I don’t think they currently do. Perhaps, it’s a question for the Vark team?