The Newsblaster project was developed by the Columbia NLP (natural language processing) Group and has been running since September 2001. Under the direction of Professor Kathleen McKeown, the site processes news stories through the application of natural language processing techniques and artificial intelligence, to produce summaries of the day’s top news stories. After 11 years at the helm, Professor McKeown spoke to the SFN blog to discuss the development of Newsblaster, and what the future holds for the aggregation site.
SFN: In the 11 years that Newsblaster has been running, what types of changes has the site undergone?
KM: Initially Newsblaster was just tasked with the gathering of news, aggregation and summarisation. But then we added in the ability to contrast different points of views on the same event, to be able to track events over multiple days and to be able to highlight in our summaries what is new, over what was originally presented. For a while we did have a multilingual version of Newsblaster, which I really liked a lot, but that was a little harder to maintain, because webpages in different languages change in different ways, so you really have to be constantly working. We would draw from 10 or 15 different languages, then we used online translators to then present the summarisation page in English. So you would have a page of all the news in the world in English but ultimately it was possible to drill down to the source language, whether French, or Spanish or Italian, to see the original language if you wanted to.
SFN: How long did the multilingual version last for?
KM: For a few years, four years or so. It basically lasted for as long as I had a PhD student working on it.
SFN: Given the interest the algorithms and machine learning have for journalists, has Columbia’s journalism school ever shown any interest in collaborating with the project?
KM: Yes, I’ve been invited to classes there to speak about it. It depends on people’s research interests. We have two things going on right now; for one thing, we have a joint degree in Computer Science and Journalism, which started last year. Basically it’s for digital media. Recently we’ve spent some time talking with Emily Bell, who’s relatively new at the journalism school and who is very interested in digital media. And secondly, we have a new institute on data sciences and engineering, which is interdisciplinary and involves multiple schools at the campus. We have just had the agreement signed so it’s in the process of starting up, and journalism is involved in that and will be doing some collaboration on that.
SFN: Have you had a mainly positive reaction to Newsblaster from the world of journalism?
KM: For the most part yes, people are interested. I think journalists want to make sure that a programme like this is not going to take over their jobs. And it’s not. It’s an automatic program, it does make errors but that’s part of the point. When we first started it up, people were quick to point out the errors. Right now, I just barely maintain it because I don’t have funding on it. We need some work on bringing it up to speed.
SFN: Are there any further updates planned for Newsblaster, or, if you were to get more funding what would you do with it?
KM: I would definitely be interested in going back to the multilingual version, working on that further. One of the things that we’re doing now with the site is that we’re using the summaries that we built over periods of time to do other kinds of things. For example we’re using all the summary article pairs to generate data in order to be able to answer questions about events. I guess in the future we hope to be doing more work on question and answering, developing a machine learning approach to that. We’re also doing work on using Newsblaster to track events over time, to look at correlations between what’s happening on the news and what’s happening in social media and how people’s opinions change, and whether we can correlate that with new events that have happened. And we’re doing that in particular with political campaigns.