Open Source Bookmark Curation

With the latest release of Sweeper, you can roll your own bookmarking service. This is really powerful when you start activating plugins like our auto-tagger SiLCC or our our Push plugins which can output all of your bookmarked content as a feed that can be consumed by other applications.

We call this little plugin Quiver. It’s where you manually collect and store information using Sweeper. Essentially it turns Sweeper into a your free and opensource Delicious clone, with all the contextualization and aggregation features that people have come to love it for.

So how does it work? It’s simple! Just download and install any version of Sweeper following the current release of v0.3.2 which can be found here.

Once you’ve done that, go to the ‘sources panel’.

Select ‘Quiver’ from the list.

Drag the bookmarklet to your browser bar.

Done! Sweeper is a tool for the curation of real-time media. Now the things you find interesting can be mashed up with the content you’re aggregating from the web, twitter, email and other feeds! It’s particularly useful for journalists or researchers who need the real-time content, but who want to augment that with their personalized interests and findings.

Get it from Swiftly.org

Algorithms Augmenting Human Decisions

Here’s an update about the SwiftRiver platform from PDF11 which I had the pleasure of speaking at yesterday. My slides are below and here you can find video of my presentation

Crowdsourcing 102: Mining Real-Time Data  

The summation of the talk is that the Swift project has been assigned a very complex and incredibly difficult task: to verify and contextualize data from the mobile and social web. How do we do this, well this seems to be the part that confuses people. It’s not any of our apps, and it’s not any of our individual APIs that we rely upon to do this. It’s the combination of all these things together, as part of one robust algorithm that tries to digitally reconstruct the real-world context, using the features extracted from the content t prioritize and de-prioritize information relevant to that context.

I like to refer to this as folksonomic triage where layers of historic, social, temporal, geospatial and other types of information are layered one another to perform a function, and the system (through a process called active learning) then learns how to improve form the user’s interaction. What this attempts to do is allow the human to give the machine algorithms some insight into the types of content they prefer, and the types of content they dont. A statistical profile of the content features of each type is recorded, with varying degrees of nuance in-between including accounting for bias, crosstalk, irrelevancy and falsehoods.

Some of this happens on the application side, some of it happens on the logic/cloud side of things. This is because it’s very important that user understand that the platform is there to serve them, and not the other way around; algorithms augmenting human decision making. This means we’ve abstracted some elements of the system logic (the elements that everyone needs to re-use over and over again) while the things specific to the use of the platform, are defined in the UI.

Usecases

We’re really excited to have had a number of really amazing partners new and old using the platform. This includes groups like Newsti.ps who are building a ‘people’s newswire’ using the Swift products.

There are also some really big uses that are occurring. For instance this BBC article profiles the PAX system that is using our platform to power a conflict early warning system. They want to index massive amounts of data from around the world and then use that data to spot historic patterns and trends that then can be used to demonstrate confidence in future patterns.

One of our favorite uses of the Swift platform to date was Product (RED)’s use last year to mashup large quantities of social media activity to power their Turn The World (RED) campaign.

There have been many more uses that we can’t talk about yet, but hopefully those become pubic soon.

Some Numbers

There are currently eight different code repositories housing the greater Swift project. Each of these API elements is tackled as if it were a single problem. This includes code for location disambiguation, natural language processing, influence detection, reputation monitoring and duplication filtering. You can find more about them here - http://blog.swiftly.org/post/5788873594/resources-for-developers

  • These combined repos contain around 150,000 lines of code (not including frameworks like Kohana)
  • Over 7,000 downloads of Sweeper to date
  • Which theoretically means at least 7,000 users of our APIs
  • Sweeper users tend to aggregate thousands of items of content over the life of a deployment which means we’ve taken around 70,000,000 items of unstructured data and done things to it like add location, tags or filtered the duplicates. That’s a very liberal extrapolation, but if gives you a sense of the amount of data we’re dealing with.
  • As the project moves forward, and all our APIs are finally completed, this number will grow exponentially. With RiverID alone (which tracks the reputation of content and people online) we expect to be indexing over half a billion items of content and actions from the social web alone by the end of the year. That’s just one API, the others will also need to scale on equal terms.

Photo by Fabrice Florin