Translating Realtime Social Media

One of the problems a lot of crowdsourcing projects have is that they end up pulling in massive amounts of data from the web, Twitter and other channels from around the world. This means content arrives in many different languages, often languages that the deployer doesn’t speak.

Currently in Sweeper and soon in Ushahidi, users can translate real-time content from one language into another, on the fly, as they receive it. This is done using our Google Translate plugin which currently supports 50+ languages.

For the Sweeper deployment we’re using to monitor the situation in Japan internally, we’re using this feature to monitor events, since we can’t manually translate every single message coming through. We’ve found it a significant timesaver. You can also see below that we’re showing the user what language the message was translated from, or if it’s been translated at all…

Before:

After:

It’s important to understand, that this is machine translation, so it’s far from perfect. But if you’re monitoring feeds from multiple countries across Twitter, RSS, Email or SMS it’s sometimes useful enough to get a quick sense of what’s being said, where to potentially look for more info, or perhaps where to direct human translators.

Summer of Swift: Soe

The Google Summer of Code has ended. This year SwiftRiver was a mentoring organization and we wanted to give our GSoCers some ‘face’ time on the blog by interviewing them. Soe is a developer who worked on our distributed reputation product, River ID.

Interview with Soe Q: What is your educational (or professional) background?

Bachelor in Mechanical Engineering. Starting Master’s in Sustainable Engineering this academic year. I like to code for fun and for social wellness.

Q: The project you were working on was called ____________. Why did you select that as your GSoC project and what did you learn from working on it?

RiverID. I find it challenging as I am required to build a scalable platform interacting with remote Swift instances from scratch. The project involves technical challenges to work with NoSQL for scalability and OAuth and REST for interacting with remote Swift instances. Additionally, it involves working with human psychology to determine and reward useful contributions towards Swift instances.

Q: What challenges did you run into during and how did you overcome them?

Working with edge technologies; hence, not many reading materials and examples are available. Hence, it involves lots of trying out and experimenting around.

Q: GSoCers get to choose the organizations they work with, why did you choose to work with SwiftRiver?

I am much impressed by social impact that Ushahidi has. As SwiftRiver would add more values to Ushahidi, I wanted to contribute something to this upcoming platform.

Q: Any closing remarks?

I will continue contributing to RiverID till it is functional. As I have keen interest in developing journalism tools for rural areas, my experience working with RiverID would be very useful for my future project.

Summer of Swift: Nishith Rastogi

The Google Summer of Code is drawing to a close. This year SwiftRiver was a mentoring organization and we wanted to give our GSoCers some ‘face’ time on the blog by interviewing them. Nishith is a developer who worked on our natural language processing api, SiLCC.

Interview with Nishith

Q: What is your educational (or professional) background?

I am currently pursing Msc. (Hons) Economics and B.E (Hons) Electronics and Electrical engineering from BITS-Pilani, Goa Campus, India.

Q: What was the project you worked on at Swift?  Why did you select that as your GSoC project and what did you learn from working on it?

I was working on improving the existing features and addition more functionality to the SiLCC component of the SwiftRiver project. I selected the same because I wanted to work in the area of Natural Language Processing and hone my skills in this domain. I learnt a lot in the field of NLP and also wet my feet in databases and WebFrame works. Neville helped me a lot during my entire summer.

Q: What challenges did you run into during and how did you overcome them?

I am primarily a back-end engine guy and deal mostly with command-line applications, also till date I have programmed mostly individually. So dealing with WebFrame works and collaborative coding and code conventions were completely new to me. It was brilliant to learn these essential skills, official documentations of the respective technologies, Google and my mentor provided the necessary and critical guidance.

Q: GSoCers get to choose the organizations they work with, why did you choose to work with SwiftRiver?

I was very specific in my want to work only in the area of Natural Language Processing, and SwiftRiver was one of the extremely few organizations offering me an opportunity to do so. Along with that the social impact and applications of Ushahidi/SwiftRiver excited me.

Q: Any closing remarks?

Machine Learning and Textual Analysis have come a long way since the point where a plain text search was the epitome of Data Mining. I would strongly recommend and encourage everyone to work on or try tools like Ushahidi/SwiftRiver. You will have an opportunity to create and end to end impact, interact with a brilliant team and learn a lot. I would sincerely like to appreciate and thank the Ushahidi/SwiftRiver team for providing me with this opportunity.

Summer of Swift: Mang-Git

The Google Summer of Code is drawing to a close. This year SwiftRiver was a mentoring organization and we wanted to give our GSoCers some ‘face’ time on the blog by interviewing them.  Mang-Git is a developer who worked on our influence analytic application, Reverberations.  

Reverberations is a pretty simple app, it’s a RESTful solution for determining the influence of content online.  Mang worked on the Twitter portion, where an algorithm returns the number of retweets an item has, and then the number of retweets those retweets have as well.

Reverberations is an open source project which can be found here: http://github.com/appfrica/Reverberations

Meet Mang-Git

Q: What is your educational (or professional) background?

I am a recent graduate of the University of Michigan, with an undergraduate degree in Computer Engineering. Over the past couple summers I have interned at Cisco Systems, as a Software Development intern. During the school year I worked as a computer technician for the School of Social Work at U of M.

Q: What was the project you worked on for us? Why did you select that as your GSoC project and what did you learn from working on it?

The specific project I worked on was called Reverberations, it is an add on to the SwiftRiver tool set. I chose to work on SwiftRiver, because of the far reaching social implications the software had. From my time working on Reverberations I have learned a lot about social networking, the REST architecture, as well as how to use the twitter API to gather the information I needed.

Q: What challenges did you run into during and how did you overcome them?

The largest challenge I came across was determining how to create a retweet tree. Twitter does not provide information on whether a retweet is a retweet of another retweet.

example…

All retweets on twitter are displayed as retweets of the original tweet. To make matters even more difficult, I did not notice that the status_retweets API call returned all retweets as a retweet of the orignal tweet until the last week of GSoC. Eventually with some help from the Twitter developer forum I came up with an algorithm to create a retweet tree, using retweet timestamp information and info on the followers of a user. The most difficult part about developing this “algorithm” was figuring out what the implications are for each piece of information, as well has how to handle situations when it is impossible to determine who the likely parent tweet is. In the end I’ve created an algorithm, that although cannot always be a 100% sure of whether the retweet tree is correct, but is better than any other solution I have seen, especially since I have yet to find an application that even attempts to create a retweet tree using information form the Twitter API.

Q: GSoCers get to choose the organizations they work with, why did you choose to work with SwiftRiver?

SwiftRiver, along with Ushahidi, not only provides crucial information to rescuers during a crises, but it can also provide realtime, crowd sourced news to the general public. An example of this would be using Ushahidi and Reverberations in a situation such as the riots in Iran. Allowing reporters to filter and manage the huge amounts of user generated content flowing out of Iran, and thus helping spread the news of Iran even when western reporters were not allowed in Iran.

Q: Any closing remarks?

GSoC is a great program, and provides an excellent opportunity for students like myself to participate in open source projects, and be rewarded for giving back to the open source community. My participation with SwiftRiver has taught me a lot about the software development process. I have had to do everything from managing my time, to defining a spec from a concept idea, to learning how to use the tools needed for developing my program, and finally actual development and testing of my code. Working with Jon and GSoC has really been an excellent experience, I feel that I have learned a lot and grown as a developer.

Thanks, Mang-Git!

Ushahidi’s Google Summer of Code Projects

From it’s earliest days Ushahidi has been an open source project that people from all over the world have contributed to. Thus, it was our pleasure to find out were accepted into the Google Summer of Code as a mentoring organization this week.

google summer of code


About Google Summer of Code

Google Summer of Code is a global program that offers student developers stipends to write code for various open source software projects. We have worked with several open source, free software, and technology-related groups to identify and fund several projects over a three month period. Since its inception in 2005, the program has brought together nearly 2500 successful student participants and 2500 mentors from 98 countries worldwide, all for the love of code. Through Google Summer of Code, accepted student applicants are paired with a mentor or mentors from the participating projects, thus gaining exposure to real-world software development scenarios and the opportunity for employment in areas related to their academic pursuits. In turn, the participating projects are able to more easily identify and bring in new developers. Best of all, more source code is created and released for the use and benefit of all.

Ushahidi GSoC Projects

For projects we’re working on or ideas on what to contribute as part of the GSoC program, please visit http://swift.ushahidi.com/extend/ideas/.

For Potential Applicants

For people interested in participating in Ushahidi’s Summer of Code projects, please review our organization’s projects here and then fill out the application form here.

Other Relevant Links

mailing list - http://groups.google.com/group/swiftriver
IRC channel - irc://irc.freenode.net/#ushahidi
skype public - click here
facebook - http://www.facebook.com/pages/Swiftriver/362720609137

Mashing Up Google Sky and Ushahidi

Like any software tool, Ushahidi can be repurposed for uses other than what it was designed for. The other day I was considering some alternative uses for the Ushahidi platform, for situations beyond the ‘crisis’ scenarios for which it was initially conceived. In reality, Ushahidi could be used for any sort of location based reporting. If someone wanted to, they could use it to map reports of car accidents, holiday sales, or celebrity sightings.

I happen to be enthusiastic about astronomy and I couldn’t imagine a more perfect scenario than using Ushahidi to help professionals and hobbyists alike, to monitor the stars. Enter the concept art below for an app I dubbed…SPACE MONITOR.



Before I continue, I should note that Space Monitor does not exist (yet). It’s entirely possible technically, but the screenshot above was created using an Ushahidi installation, Google Sky and Photoshop. Google actually offers Google Sky tiled maps online, so I conceived this as a project that would bring the two together. The idea is that it would aggregate scientific reports and amateur discoveries to allow the astronomy community to monitor discoveries, abnormalities and interesting facts about space.

Let’s take a closer look shall we?



Figure A.

In the image above you see a tiled map of the stars. In a crisis monitoring Ushahidi instance, the tiled maps are of a place on earth where the size of the circles represents the frequency of reported incidents of violence in any given area. In the case of monitoring space, you have the same basic concept. However, instead of reporting incidents of violence, Space Monitor tracks irregularities of measurements recorded in different regions of the sky.

This is how professional scientists make new discoveries in space. When things are different, they scrutinize their measurements to determine if they can determine causation for irregularities. If the cause is something previously unrecorded, then it’s documented as a new discovery or find. Recently scientists at the European Southern Observatory reported the discovery of 32 new planets outside of our solar system. The typical method for discovering planets is to measure the magnitude of and brightness of the stars that are ‘behind them’. As the planet crosses between our vantage point of a star, scientists can measure the change in magnitude and determine what might be the cause. Sometimes the cause is discovered to be a planet’s orbit, sometimes it’s a nearby asteroid or other celestial objects.

With a tool like Space Monitor, anyone enthusiastic about astronomy could learn about all sorts of recent discoveries from one central location. This might serve as inspiration for more research, or it might provide the tip that allows an amateur enthusiast with a telescope to make a discovery.



Figure B.



Figure C.



Figure D.

This would be done by following a feed of reports like the ones above. In Figure B the feed is of a module that tracks ‘irregularities’ reported by different scientific instruments. This information is often hard to get, but if organizations like ESO and NASA offer such reports via their websites or Twitter accounts. Wit the proper meta information, these reports can be mapped. In Figure A these fake reports are represented by the white ellipses superimposed over stars. The more activity (irregularities) from a particular region of the sky, the larger the circle. This would also included ‘submitted reports’ by scientists and amateurs all over the world. Of course these would all need to be curated and verified by an in-house team that would verify their authenticity.

In Figure C, the module would track official reports (press replaces, news stories, articles, etc.) related to astronomy. If you look closely, in my mock up app, a series of reports from Figure B have lead to the announcement of a discovery in Figure C. The timeline in Figure D shows the frequency of reports on any given day.

Right now Space Monitor is purely conceptual, but hopefully with the right involvement from professional or amateur astronomers, we’ll see Ushahidi applied to such scenarios in the very near future!