Media Cloud



Media Cloud is an open-source content analysis tool that aims to map news media coverage of current events. It "performs five basic functions -- media definition, crawling, text extraction, word vectoring, and analysis." Media cloud "tracks hundreds of newspapers and thousands of Web sites and blogs, and archives the information in a searchable form. The database ... enable[s] researchers to search for key people, places and events — from Michael Jackson to the Iranian elections — and find out precisely when, where and how frequently they are covered." Media Cloud was developed by the Berkman Center for Internet & Society at Harvard University and launched in March 2009. It's distributed under the GNU GPL 3+.

As of October 2011, Media Cloud tracks news from mostly U.S. sources. It "collects news stories" in sets from:
 * "Top 25 mainstream media sources from the U.S. according to the Google Ad Planner service" (includes New York Times, BBC, etc.)
 * "1000 most influential U.S. political blogs according to Technorati" (examples include Outside the Beltway )
 * "1000 most popular feeds in Bloglines" (such as Gawker)
 * "All public feeds from whitehouse.gov"

Among the companies that have collaborated with Media Cloud (or still do) are Morningside Analytics, Betaworks , Bit.ly, Associated Press and Global Voices.

What Media Cloud does
On May 6, 2011 the Berkman Center relaunched Media Cloud, "a platform designed to let scholars, journalists and anyone interested in the world of media ask and answer quantitative questions about media attention. For more than a year, we've been collecting roughly 50,000 English-language stories a day from 17,000 media sources, including major mainstream media outlets, left and right-leaning American political blogs, as well as from 1000 popular general interest blogs." The data was used to "analyze the differences in coverage of international crises in professional and citizen media and to study the rapid shifts in media attention that have accompanied the flood of breaking news that's characterized early 2011." International research has led the way to publishing of "new research that uses Media Cloud to help us understand the structure of professional and citizen media in Russia and in Egypt." The relaunch of Media Cloud allows users who are interested in using its tools to analyze "what bloggers and journalists are paying attention to, ignoring, celebrating or condemning."

Design Process
Discussions about the media landscape were a constant between the MIT Media Lab and the Harvard University's Berkman Center, and they faced the common obstacle of needing a solution for crunching data about news stories in large scale, in order to obtain the answers to certain questions. The motivation for the creation of Media Cloud did not come from one particular question, but from a myriad of them. In the about section of their webpage, the developers of Media Cloud cite some of the early driving questions the system was intended to solve: The data analysis would then be able to analyze different aspects of the news coverage, such as the media sources and the languages that would cover these stories.
 * Do bloggers introduce storylines into mainstream media or the other way around?
 * Is online media or print news more powerful in setting news agendas?
 * What parts of the world are being covered or ignored by different media sources?
 * Where and how do important news stories begin?
 * How are competing terms for the same event used in different publications?
 * Can we characterize the overall mix of coverage for a given source?
 * How do patterns differ between local and national news coverage?
 * Can we track news cycles for specific issues?
 * Do online comments shape the news?

How it works
First, Media Cloud chooses a set of media sources and uncovers the feeds for each. Each feed is then crawled in order to determine if any stories have been added to any feed. All content is then extracted of each relevant story. Any advertisements or other navigation pages are left behind. The text of each story is broken down into word counts, which shows the different word choices that each media source uses in discussing any relevant topic. The word counts are then analyzed and published to show data trends.

Uses and application
Media Cloud was used from September 2010 through January 2012 to obtain data for a study at the Berkman Center for Internet & Society that analyzed a set of 9,757 online stories related to the COICA-SOPA-PIPA debate. The open source application was utilized for the text and link analysis portion of the research. Findings from this research were published in July 2013.

The Berkman Center for Internet & Society website offers an interactive visualization map from this study, which was created to "depict media sources ("nodes", which appear as circles on the map with different colors denoting different media types) ... [and] track media sources and their linkages within discrete time slices and allows users to zoom into the controversy to see which entities are present in the debate during a given period ..." This map allows for the visualization of how the COICA-SOPA-PIPA controversy evolved over time by using link analysis.

Many companies are taking advantage of the ability to analyze and organize this new data that media cloud can create. Companies such as RAMP offer a "cloud-based" way to analyze and create every type of metadata.

The discussion about media bias has been affected in the way that the Media Cloud project is viewed as having transformed the debate into less of a matter of personal opinions of journalists, and more into a data-driven discussion.

Confirmation bias is one form of development of media bias, in which individuals will seek sources of information that align with the beliefs they already have. And the ultimate consequence of confirmation bias is the creation of echo chambers, a situation in which all the sides of a dispute (usually two) are talking only to themselves, and that will most likely stagnate the debate and muddle the creation of consensus. The analysis of Media Cloud about the Gamergate controversy, an episode of personal attacks on female game developers, showed two cluster of news sources and discussions that were divided by the criteria of the most common opinions. Although there were links between these points in the analysis graph, they did not represent the creation of consensus, and were pointed to as hate-links, in which the sources that would link sources from the other clusters would do so in order to strongly disagree with the content of the link.

Influence
Media cloud's key functionality comes from using web crawling to periodically fetch articles from various sources and then break them down into words that are counted. These word counts are then analyzed to determine what sources are saying about certain news. This process is not unique to Media Cloud and in fact is an application of the recently popular stream algorithms. These are algorithms characterized by operating on a continuous and unending stream of data, rather than waiting for a complete batch of information to be assembled. These algorithms are very useful because they allow monitoring of trends without having to know which topics are going to be the most popular. This type of functionality first noticeably emerged with network managers trying to dynamically see which sites have the highest traffic volumes. From there, stream algorithms have been used to have programs dynamically act on financial information, and by researchers whose experiments generate more data than can be analyzed, so stream algorithms are used to dynamically filter the initial data. Media cloud has similarly taken advantage of the functionality of stream algorithms to dynamically associate words to news as it crawls through various sources, and then provide its signature service of generating sentences based on words that the users are interested in and related media reports.

Ideology
The Center for Civic Media states their goal as the fostering of political action and the supporting of civic media, by providing a hub for the production of technological innovation that can be used as tools for such purposes, and by the coordination of community-based design processes at municipal, nation and global levels.

The Media Cloud is one of the tools used by the Center for Civic Media, and with the capacity for providing data analysis about news coverage from different sources, it has been used in many situations to derive conclusions about the history of news coverage of an event or category, and to generate civic engagement.

Developers of Media Cloud intend to help online activism, providing tools to verify the impact of media stories written by activists. These online activists can verify their impact by data analysis acquired using the tool, and looking for important locations it was not able to reach. The sharing of content that has not reached specific target groups can be done by the use of the provided platform.

The project self-identifies as one of solution by Innovation.

SOPA/PIPA
A social study that states that the public debate was responsible for the Stop Online Piracy Act failing, and how it played its role, was proposed by five researchers of the Berkman Center for Internet & Society at Harvard University. The data analysis was based on the use of Media Cloud to analyze the profile of those engaged in the issue. Among those involved in the debate, the study lists people from all groups of political beliefs. As for the sources of coverage and platforms for discussion, tech media, gaming sites and political blogs are said to have played a role bigger than the one of mainstream media. The main conclusion of the study is that the diversification of the political views rather than polarization of the debate enabled the creation of consensus which pressured the policymakers to vote against the bill.

Death of Trayvon Martin
The Pew Research Center classified the killing of Trayvon Martin, a teenager who was shot in the chest by a Florida policeman, the most covered news with a social component in the five years that anteceded the event. The paper, conducted with use of Media Cloud, proposes that alternative media participation helped shift the focus of the story from the episode of the shooting to a series of reports on racial segregation.

Abortion
A study by Julia Wejchert and Katherine Ida made use of Media Cloud to analyze the nature of the news coverage about the abortion debate, focusing on the most shared on social media. According to the proponents, the numbers of the study indicated that the media coverage was focused on legislation and activism. Also, after hand-classifying sources of news coverage as having specific political leanings (conservative, liberal, centrist and libertarian), they point to different use of imagery inside these clusters, with what they classified as liberal media showing protests, mainstream media showing legislative photos, and conservative media showing images of a fetus or a live infant. Julia and Katherine concluded from these results that there is a lack of strategy on the part of the pro-choice side, in contrast with what they see as a well-shaped narrative on the part of the Anti-abortion side.

Charlie Hebdo and Baga massacre
During the first week of 2015, two different episodes of religious violence caused by Islam extremists took place in Baga and Paris. A study conducted by media outlet The Conversation using Media Cloud evaluated and compared the news coverage about the two events. The results of the study pointed to a much larger coverage of the Charlie Hebdo shooting than the Baga massacre globally. The study also reported that the coverage for the Paris attacks exceeded the coverage for the Baga massacre even in Nigeria. The report attributed the uneven media attention not only to Eurocentrism, but to the difficulty in taking a side in the conflict between the Boko Haram and the Nigerian Army.

Net Neutrality
A study of the Berkman Klein Center based on the results of data analysis of Media Cloud about the influence of the Internet on the overruling of the proposed policy on Net Neutrality by the Federal Communications Commission advocated that the diversity of the debate helped shape the result of the appeal.

Nirbhaya Case
The news of the gang rape of a medical student, Nirbhaya, in New Delhi caused a rise in news coverage of sexual assault in India. This event was the focus of a research study by Media Cloud that analyzed the nature of reports about sexual assault. The proponents of the study defended that the news coverage had ignored gender inequality as a cause for such episodes, and that they had been treated as episodic happenings, in which the individuals responsible were the only ones to blame, according to the data analysis they obtained.

Future use
The day that Media Cloud relaunched, Ethan Zuckerman said, "We hope the tools we're providing are a complement to amazing efforts like Project for Excellence in Journalism's News Coverage and New Media indices--we consider their tools the gold standard for understanding what topics are discussed in American media. PEJ works their magic using talented teams of coders, who sample different corners of the media ecosystem to find out what's being discussed. We use huge data sets, algorithms, and automation to give a different picture, one focused on language instead of topic."

Future uses for Media Cloud can involve smart phone or tablet applications to introduce the platform to users away from a computer. A Media Cloud app could serve as a news source while on the go for users. If Media Cloud were to expand into different information sites, it could target social media sites and incorporate news into them. Twitter and Facebook have incorporated features for trending news and topics similar to what Media Cloud aims to do.

The tool is expanding to sources that are not qualified as media, to understand the repercussion of studied events. Social media is the main target, as it includes not only the sharing of news, but also the response to shared content.