Open Source Bookmark Curation

With the latest release of Sweeper, you can roll your own bookmarking service. This is really powerful when you start activating plugins like our auto-tagger SiLCC or our our Push plugins which can output all of your bookmarked content as a feed that can be consumed by other applications.

We call this little plugin Quiver. It’s where you manually collect and store information using Sweeper. Essentially it turns Sweeper into a your free and opensource Delicious clone, with all the contextualization and aggregation features that people have come to love it for.

So how does it work? It’s simple! Just download and install any version of Sweeper following the current release of v0.3.2 which can be found here.

Once you’ve done that, go to the ‘sources panel’.

Select ‘Quiver’ from the list.

Drag the bookmarklet to your browser bar.

Done! Sweeper is a tool for the curation of real-time media. Now the things you find interesting can be mashed up with the content you’re aggregating from the web, twitter, email and other feeds! It’s particularly useful for journalists or researchers who need the real-time content, but who want to augment that with their personalized interests and findings.

Get it from Swiftly.org

Introducing Push Plugins

Anyone pulling from the nightly repo may have noticed a cool new feature for the Swift Core that Ahmed wrote last month, our Push Plugin architecture.  This, as well as a number of other features will be released with the next release of Sweeper and the Swift PHP Core.


How Push Plugins Work

Push Plugins allow SwiftRiver applications to acquire content via push (versus pull) commands.  For instance, if a user needs an SMS gateway to submit to a Swift app, you no longer need to poll the server for that content, instead, the gateway can tell your app when there’s content by pushing to the application.  
 
This plugin architecture currently supports receiving data through the standard HTTP methods GET and POST.
 
In addition, this architecture can be extended through Push Plugins to support the injection of any kind of data into the system. For example, the uploading of content from files, or the use of bookmarklets such as the Quiver extension.

How To develop Push Plugins

Locate the Modules/SiSPS/PushParsers folder, this is where you’ll find push parsers. To develop a push parser you will need to do the following:

  1. Create a file named <parsername>PushParser.php (the class name needs to be the same as the file name).
  2. Needs to be in the namespace Swiftriver\Core\Modules\SiSPS\PushParsers;
  3. Implement the following methods:
  • PushAndParser($raw_content = null, $post_content = null, $get_content = null)
  • GetDescription() - This is what gets displayed in the Sweeper UI that describes how the parser works
  • ReturnType() - Returns the type that describes what the push parser is all about

The second and third methods are implemented for display purposes so that your parser can be displayed correctly in Swift applications.

The first method is where you need to write code to convert the content being received by your parser into the Swift object model, this function should also return the content back to the rest of the SwiftRiver Core once its finished.

Depending on the type of resource your parser is listening out for, it will receive the content in one of the three variables $raw_content, $post_content and $get_content.

Developing Plugins for SwiftRiver Applications

Ahmed Maawy, the newest hire to the SwiftRiver project, recently compiled this great ‘how-to’ guide on writing plugins for SwiftRiver applications like Sweeper and SwiftMeme.  These plugins can mostly be found at http://plugins.swiftly.org while the wishlist for things we’d like to see built can be found here.

For a great example of how Swift plugins work, check out the Ushahidi Report Push plugin, which allows content verified in Sweeper to be passed along to Ushahidi.  Coupled with the Yahoo Placemaker plugin, this is really powerful as it allows all content to pass from Sweeper to Ushahidi, auto-geolocated.

You can view a fully formated version of this guide on Google Docs


Before we begin it is worth noting that all SwiftRiver applications have 3 major components:

  • SwiftRiver Core - the engine behind content retrieval, processing and storage.
  • Installer - in charge of initial setup of the SwiftRiver platform.
  • Sweeper - Sweeper is the application built on top of the Kohana PHP framework that acts as a web application that renders or provides a UI on behalf of the operations performed by the SwiftRiver core.

There are 3 very important elements to understand for SwiftRiver applications when developing and extending the platform for customized functionality (These 3 elements can be considered as “plugins” for SwiftRiver).

  • Impulse Turbines - Are elements that process and add value to content received from external sources.
  • Reactor Turbines - Are event handlers, and are not necessarily meant to add value to content but to react to specific events within SwiftRiver.
  • Sources - Are parsers for different types of content. They are responsible for retrieving content from the Internet or other relevant sources, and translating this content to SwiftRiver content items, so that content from different sources can all have a uniform format within SwiftRiver.

Its is important to note that the SwiftRiver /Modules folder contains a number of these event handlers (Reactor turbines and Impulse turbines). However, Sources (Also known as Parsers) are developed within the /Modules/SiSPS/Parsers folder.
 
This is a step by step approach regarding how content is received and processed within the core:

  • Parsers take the content from the various external sources, and convert it to the Swift object model.
  • Impulse Turbines may act on the SwiftRiver content items and add value to these content items.
  • Reactor Turbines may be used to work on the end result of the content either before they are processed by Impulse Turbines, or during their processing cycle, or anytime within the lifetime of the content after specific user actions (such as mark content as accurate).

Parsers / Sources

Parsers are located within the /Modules/SiSPS/Parsers folder and follow the following important rules: 

  1. Have to have a <Parser_Name>Parser.php file name format
  2. The class name has to be the same as the file name
  3. The class name must implement the IParser class
  4. It must be within the namespace SwiftRiver\Core\SiSPS\Parsers
  5. Must contain the following functions:
  • GetAndParse($channel): returns an array of Content Items
  • ListSubTypes(): Returns the sub types of the Parser
  • ReturnType(): Returns the type of the parser (Which has to have the same name as the parser you specified in <Parser_Name>
  • ReturnRequiredParameters(): Returns an array of the parameters required to initiate a single source entry for this parser.

You may take a look at how content items for Twitter are generated to get an example on how parsers work. Content Items are also passed back together with Source data where available. You may also need to know how the object model for a Channel, Source, and Content are structured. These classes are located within the /ObjectModel/ folder.

Impulse Turbines

Located in the /Modules/ folder. Use the following important rules:

  1. Have a <Module_Name>PreProcessingStep.php file name format.
  2. The class name has to be the same as the file name.
  3. The class must implement the \Swiftriver\Core\PreProcessing\IPreProcessingStep class.
  4. Must be in the namespace Swiftriver\PreProcessingSteps 
  5. Contain the following functions:
  • Process($contentItems, $configuration, $logger): Which does processing on the content items.
  • Name(): Returns the impulse turbine name.
  • Description(): Returns the description of this pre-processing step.
  • ReturnRequiredParameters(): Returns an array of required parameters for the pre-processing step.

You may refer to the file GoogleLanguageServicePreProcessingStep.php in /Modules/GoogleLanguageServiceInterface/ folder for an example.

Reactor Turbines

Are located in the /Modules/ folder with the following important rules:

  1. Have a <Module_Name>EventHandler.php file name format.
  2. The class name has to be the same as the file name.
  3. The class must implement the \Swiftriver\Core\EventDistribution\IEventHandler class.
  4. Must be in the namespace Swiftriver\EventHandlers 
  5. Contain the following functions:
  • HandleEvent($event, $configuration, $logger): Contains the event code.
  • Name(): Returns the impulse event.
  • Description(): Returns the description of this event.
  • ReturnRequiredParameters(): Returns an array of required parameters for the event.
  • ReturnEventNamesToHandle(): Returns an array of the event enumerations the turbine tends to handle.

You may refer to the file UshahidiAPIEventHandler.php in /Modules/UshahidiAPIInterface/ folder for an example.

Important notes to consider during the EventDistribution phase 

  1. The ReturnEventNamesToHandle() function points to an enumeration from the /EventDistribution/EventEnumeration.php file. This is where you can design your own custom enumeration.
  2. It is most appropriate to place event handlers within the application’s workflow. Application workflows are placed within the /Workflows/ folder. For example, all workflows related to channel activities are placed within the /Workflows/ChannelServices/ folder.
  3. The example below demonstrates how you would invoke an event within a specific place within the workflow:

$event = new \Swiftriver\Core\EventDistribution\GenericEvent(

\Swiftriver\Core\EventDistribution\EventEnumeration:: $ContentPostProcessing,

$processedContent);

$eventDistributor = new \Swiftriver\Core\EventDistribution\EventDistributor();

$eventDistributor->RaiseAndDistributeEvent($event);

Please feel free to contact the SwiftRiver team for any further assistance and help. You can contact us by emailing support@swiftly.org

U St. Brainstorming Session

Patrick Meier and some friends and users of Swift stopped by Affinity Labs in Washington a few days ago with some great suggestions and feature requests for the next release of our Sweeper application.  Our work was largely centric around rethinking user interaction options.  It was an exciting day and we’re really looking forward to incorporating these suggestions in our next release.

Check out the slideshow for some shots of our brainstorming session and the image below for a sneak peak at the proposed redesign.

SwiftRiver Dataflow Infographic

I’m often asked about the architecture of the SwiftRiver platform. There’s been so much written about, talked about and presented to date that I thought I’d take a different approach.  So rather than bore you with another long blog post, I thought I’d share some visuals that explain the system.

PDF | Video | High-Res Image


If the images above are too small, try downloading the PDF version, High-res image or watch the video below.

Don’t forget to vote for us in the Knight Challenge!

GeoDict Joins the SwiftRiver Initiative!

SwiftRiver is an opensource project with the overarching goal to help people make sense of data on their terms. We do this by adding all types of elusive context to content: tags, predictions for accuracy, indicators of influence and location etc.

Location is incredibly important to us because one of our many objectives is to help users verify data and location often serves as a clue about whether content is accurate or not.  For example, in this post Vladimir Ermakov describes how Swift attempts to auto-detect the location news articles refer to using statistical analysis of text.  That algorithm needs a database of locations to work, however.

Particularly in the case of crisis-mapping, this is key. In applications like the Ushahidi platform, people can aggregate ‘reports’ about events and visualize that data geo-spatially. Because that data comes from the crowd, and because it all needs to be location based (for visualization), it’s critical that the location appended to the message be accurate…or at least as accurate as possible.

So contextualizing crowdsourced data through location is a huge priority. Another priority is ensuring that our platform work relatively the same offline as it might online. This means we want to ensure that our products rely primarily upon other open source projects whose source code can be deployed on a local machine or behind a firewall.

Recently we realized we were beginning to rely upon Yahoo’s Placemaker service for our location detection features. Yahoo is great, but to rely upon such a huge, proprietary product cripples the access for some users. We spent several months thinking about building our own alternative (see: SULSa), but ultimately it proved beyond our resources. So we set out to find an opensource alternative to Placemaker and we found one in the form of the GeoDict project.

GeoDict is an opensource project for pulling location information from unstructured text. Given our recent experiments in this same area, we found a the GeoDict project inspiring. So although it was an active project with a growing community, we invited the people behind it to allow SwiftRiver to officially adopt the codebase.

What does this mean?

We’re not sure entirely, but there are some things we do know.  Both projects will remain available under the GPL. You’ll see us contribute our staff, time and resources to the development of GeoDict (because it’s an open source project aligned with our greater mission). GeoDict’s community will also actively contribute back to that code, and hopefully they’ll feel welcome enough that they’ll also contribute to SwiftRiver and Ushahidi code base as well.

GeoDict will be fully integrated into the Swift Web Services family of API products which we offer as both free and paid services, but also as open-source code for anyone out there to use on their own terms.

Big thanks to Pete Warden for creating GeoDict and for supporting our project. Welcome to the Ushahidi family!

Localizing News

The following post was written by a volunteer developer, Vladimir G. Ermakov a Master’s student at Carnegie-Mellon University in Pennsylvania. Over the past few months he took on an ambitious project: to contribute code that would allow us to parse news articles and attempt to auto-detect the primary location that is the subject of any given text.


Localizing News by Vladimir Ermakov

The amount of information available in electronic format is rapidly increasing. It is becoming possible to find out real-time about the current events in a particular part of the world based on electronic data such as news articles, blog entries, twitter feeds and SMS messages. Even though the data is available, there is an overwhelming amount of it and it is hard to stay on top of events that are of relevance. Getting informed about recent developments is particularly important in the times of crisis, when lives could depend on timely response. In this project I am exploring ways to pinpoint the location discussed in text documents. I am able to achieve good results by combining location keywords extracted by Yahoo! Placemaker service with state of the art machine learning and natural language processing techniques.

The basic approach that I’ve embarked upon is to extract location keywords from a document using Yahoo Placemaker service, and then apply classification techniques to disambiguate, which of these locations is most relevant to the document at hand. I’ve conducted experiments with Naïve Bayes and Fisher classifiers using bag of words model for feature extraction, but these did not give good results. I explored an alternative approach: use count and position of location keywords extracted by Placemaker and feed them into a SMV. This proved to be a very effective way of determining the country that is the focus of the document. Applying lemmatization to location adjectives such as Russian and converting them to nouns such as Russia helped improve the results even further.

While the Reuters-21578 is was a great dataset to use for training classifiers and experimenting with the data, the articles there were collected 20 years ago. What made this project interesting for me, is the possibility of visualizing the news around the world on a map, and seeing whether sudden rise in the number of articles published can be an indicator of some important events.

To make this possible I had to obtain a recent dataset. Reuters has archived articles from the last several years on their website. I developed a simple crawler that visited news articles from this archive, downloaded them to my server, and extracted the news article text content. I then passed this content off to the Yahoo Placemaker service, and output the data with the location labels into XML files. I then could use my scripts to run the experiments on this new dataset, just like I did with the original data.

I limited my data collection to the most recent articles. The archive contained over 400,000 news articles for 2010, which too many to download. I restricted the crawler to randomly pick 10% of the articles from each day of the year. This was still a significant amount of data, 80,000 articles, and fairly representative of the whole archive.

After all the experiments I was able to narrow down on a working solution for mapping news articles - extract location information from the article using Yahoo Placemaker service, making sure to lemmatize location adjectives, extract normalized count and position of location keywords within the article, and apply SVM classifier to decide which of these locations are more important to the article. The results were encouraging, and I believe this solution is ready to deploy into a real world application. I am hoping to implement an extension to Swiftriver platform in the near future that uses this method to classify news articles by country.


Valdimir’s paper is a much longer, and much more fascinating read than I could share here but if you’d like to read it. He can be reached by emailing vermakov [at] emu [dot] edu.

We’re working on folding this and other contributions into the next release.  Thanks for the awesome work Vladimir!  Other developers interested in contributing to the Swift platform can find out more here.

Better Living Through Crowdsourcing

crowdChristian Kreutz explores the many technologies the the world is using to make sense of real world data in the digital domain. These technologies, apart and collectively, enable computers to more accurately interpret the world as we understand it. In the hopes that they’ll be able to tell us more about our reality than we are able to infer unaided.

Our relationship with these technologies is self-reinforcing, it’s both driven by, and the cause of, an explosion of the ‘sharing’ of content. In other words, the more data we have, the more we want to understand and contextualize it. The more we understand, the greater the motivation to create and share even more.

The Information Age, Amplified

Eric Schmidt, CEO of Google, recently talked about just how fast humans are creating content:

Thanks to the Internet, we now double every two days all stored information. The estimated amount is 5 exabytes according to Eric Schmidt (Google) and it took human kind 2000 years to get a similar amount of archived information.

So how are machines able to parse all this data from the real-world? Well, there are a few ways…

  • Text Recognition and Natural Language Processing
  • Voice Recognition
  • Mobile Data Collection
  • Image Processing and Computer Vision

That’s a few, but also consider a number of other technologies, programs for mining the social graph, mapping, checking-in, active learning…too many to list. The point is, the sum of these parts allows for platforms that attempt to understand media as close to the way humans do as possible. Of course, the benefit of computing is that algorithms work faster and more efficiently than we do. Despite the number technologies listed above, artificial intelligence isn’t quite where it needs to be to completely automate managing it all.

Just today there were reports that Cuil, a search engine that relied upon semantic parsing algorithms to mine the dark web, might be shutting down. I’m sure their technology was sound and some of the brightest minds in the business started Cuil, but there are real difficulties in relying on machines to do complex tasks where context is the variable.

Crowdsource the Filter

Our approach is to address the problem from a different angle, where humans can distribute work to many, use machines to aggregate the output of that productivity, and then work with smart tools that learn from the users needs and expectations. If our code isn’t smart enough to make sense of data on it’s own (it’s not) but humans are (yet they aren’t as fast or organized), then perhaps part of the solution lies in optimizing human efforts at filtering content, adding context and using the result as the base for improving future algorithmic decisions. This is called active learning, where the interactions of a human operator improves algorithms assigned to perform certain functions.

My colleague Patrick Meier refers to this as Crowdsourcing the Filter. I think at least in the near term, this is the future of intelligent computing, where smart machines assist humans, helping to us to accomplish the tasks we need to accomplish more efficaciously.

At CrowdConf next month on October 4th, SwiftRiver will be onsite demonstrating some of the applications we’ve built from this understanding. This is part of our approach to solving the problem of ‘too much data’. We’ll let the big guys like Google, Microsoft and IBM figure out the secrets to scalable a.i. In the mean time, our goal at SwiftRiver is to democratize access to tools that help people make sense of data, on their terms.

SwiftRiver Web 101 | Sept 23

swiftriver logo

Are you interested in the SwiftRiver platform?  Do you want to get a better understanding of our products or find out how to install them?  Well, we’re slowly catching up with demand for documentation, instruction and new features etc.  In the meantime, we cordially invite you to sit on your couch, in your jammies, with a big bowl of Lucky Charms and soymilk (that’s what I’ll be doing) and attend the first ever SwiftRiver Web 101 training seminar.

The event will be held Thursday September 23 8:00am - 10:00am PST/GMT-8

This event is free but there are only 15 slots for potential attendees so sign up quickly.  We can also accomodate multiple people from the same organization.

We’ve blocked out two hours from our schedule to make sure you can ask all the questions you could possibly ask, whether it’s about installation, security, APIs, code, developing plugins or SwiftApps, integration with Ushahidi/Crowdmap and more.

Planned Topics

  • Brief overview of our web APIs
  • Brief overview of system architecture
  • Sneak peak at new apps
  • Installing the Sweeper app
  • Using the Sweeper app
  • Q&A

We’ll also debut three new features for the Sweeper app and release the next build (including the new features) early, to all the people in attendance! To attend, visit the following link - Click Here to Register for SwiftRiver Web 101