Big data is 2012’s big buzzword, as many predicted it would be.
The catchy term does not simply denote vast amounts of information, but refers to the emerging technologies developed and employed to gather, process and analyse the tidal wave of new data. “Big Data is really about new uses and new insights, not so much the data itself,” says Rod A. Smith, IBM’s vice president for emerging Internet technologies, in an interview with The New York Times. Human beings have produced more data information in the last two years than at any other time in history: according to IBM’s calculations, 90% of the data currently in existence was created between 2010-2012, thanks to social media posts and interactions, online-browsing history, web-purchase receipts, GPS systems, and sensor and surveillance data. The revolutionary aspect of ‘Big Data’ lies in the way in which it forms links and associations between seemingly disparate facts, leading to new perspectives and revelations.
In an article entitled ‘How Big Data Became So Big’, the NYTimes suggests that the term has shed its ‘geeky’ origins and crossed over into the mainstream. First coined in 2008, applying the ‘big data’ epithet to data mining and machine learning has proven to be an effective marketing strategy. Not only does the vagueness of the term permit it to encompass even greater measures of data, its simplicity makes artificial intelligence seem more accessible and approachable - which could be good news for news companies reliant on algorithms and computer learning. Organisations like ProPublica are already harnessing data analysis tools and systems in order to produce “journalism in the public interest”, and normalising the use of sophisticated software to could prepare the way for an influx of other news-media start-ups harnessing big data in a similar fashion.
The demystification of data handling could also be good news for (human) data journalists. In an excerpt from the Data Journalism Handbook, contributor Michael Blastland touches upon the sense of fear and uncertainty many journalists experience when having to work with data: “The best tip for handling data is to enjoy yourself. Data can appear forbidding. But allow it to intimidate you and you'll get nowhere. Treat it as something to play with and explore and it will often yield secrets and stories with surprising ease.”
Data, whether in the form of Twitter posts or government documents, is fast becoming a journalist’s first port of call, meaning that computer science skills are becoming indispensible for reporters. A report produced by the Knight Foundation revealed that 75% of journalists felt that data analysis training ‘would offer great (or very great) benefit”, and the growing number of joint Computer Science and Journalism degrees points to a future when all aspiring reporters will be data-literate.
The problem currently affecting newsrooms is that thus far data journalism has not had a significant impact on reporting. Dire economic situations in many newsrooms mean that reporters are denied the tools and training necessary to equip them with the skills to tackle a data-led future.
With big data only set to get bigger, the next challenge facing news titles may well be balancing the short-term need for reduced production costs with the necessity of investing in new tools and training for the future.
Sources: New York Times, readwriteweb, Poynter, The Guardian, Propublica, Editors Weblog



