SwiftRiver is a platform consisting of a number of unique products and technologies. The goal is to aggregate information from multiple media channels (SMS, Twitter, Email, RSS feeds from the web) and to add context: The ‘who’, ‘what’ and ‘where’ of that which is being discussed in each message. So, who the message is about, what it’s about, and where the message originated from. Swift then uses these details to help predict the relevancy of the information coming to the user. This allows us to promote content the user cares about while suppressing content they are less likely to (spam, inaccuracies, falsehoods, and crosstalk).
One of the technologies in the works for the Swift platform is RiverID, it’s a distributed reputation system. It works through a process we call ‘triage’, where two or more (usually three) types of data are compared to make insights that aren’t possible when looking at the data alone.
Let’s use the recent earthquakes in Haiti as an example of how this works. Let’s say we get a message that says “People trapped in a severely unstable building in Neighborhood X.” Our question becomes, who is telling us this? Can they be trusted, and is the information accurate? Traditionally all these questions have to be asked and answered on the fly. That creates a bottleneck on how much information an organization can process: they either put trusted people in the field or they work with vetted organizations on the ground. This isn’t possible for organizations who want to gather crowd-sourced reports. The problem still exists and it’s now amplified because there are even more anonymous people who need to be vetted.
With the above message there are a few ways to attempt verification of what’s being reported. So we might start with location. If we know the text message has originated from someone in Haiti (there are ways to do this, for instance just looking at the country-code is one way) that location information can then inform our triage dataset.
The second form of context we can attempt to add is corroboration. Are there other reports coming from the same general location and time that corroborate what this message is telling us? If everyone in Neighborhood X is saying that it’s a perfectly sunny day and the kids are playing outside, we have a conflict. Either the crowd is lying or the text message is. So we compare one message with others to see if the stories align, and that becomes an addition to our data set. This used to take a lot of human hours. We want to speed up that process by using algorithms and natural language processing.
The third data set (the last mile) is this all becomes fun because location and corroboration can tell us a lot but they aren’t always perfect indicators. So we attempt to look at history. Has this person reported anything before? If so were they reliable then? Do we know their telephone number? In other words, can we use history as context? This is where RiverID comes in. RiverID allows a user or organization to form a profile on a user’s communication graph. If I (as a user of Swift) know someone’s name, have their phone number, email address, blog url, and social network profiles I can store all that data as a profile of the source. Then in the future if I get a text message out of the blue from Haiti, it just may end up being someone who I have a profile on.
The text message is no longer coming from anonymous sources in the crowd, it’s now coming from an identifiable sources with unique histories. From that point it’s just a matter of looking back at that users history to try to make a decision. If they tend to be reliable and accurate, their RiverID profile will give the statistical advantage to actions they take to verify other reports.
Now, I should preface that we (at SwiftRiver) never have access to all that user data. Only the organizations using our platform do, it all happens on their servers or behind their firewalls. We never touch their data, nor would we ever need to, as every use of SwiftRiver is going to have different context, and subsequently differing needs. RiverID data might only be relevant in specific contexts. Essentially we’ve taken the idea of something like Facebook Connect and we’re making it completely opt-in, and completely decentralized (the user stores user profiles, we just reference their database). This allows the organization access reputation profiles unique to their groups needs.
On a final note, I should say that triage may not always consist of the same data types. In this case it was location, corroboration and user history; in other cases it might include things like the time of the report or accuracy (as determined by the user).