User:Atske

Electoral Process and Twitter
The effects of Twitter on the Electoral Process can be seen through a number of ways including:
 * Astroturfing Campaigns on Twitter
 * Predicting Elections using Twitter
 * Paper Three Title

Astroturfing Campaigns on Twitter
This paper analyzes the diffusion of information in social media by trying to identify and track the orchestrated, deceptive efforts to mimic the organic spread of information through the Twitter network. In this manner, it aims to distinguish astroturfing campaigns from true grassroots political movements. This is important because of the increasing role of social media in the success or failure of modern political campaigns, as the grassroots organizing of the Obama 2008 campaign highlights. Thus, a motivated attacker can easily orchestrate a distributed effort to mimic or imitate organic grassroots organization, in order to spread misinformation and influence the public beyond the confines of the social network. Therefore, it is relevant to determine a way to separate true political discourse from artificially seeded astroturfing campaigns. Data was collected through the Twitter ‘Gardenhouse’ streaming API, (dev.twitter.com/pages/streaming_api) which provided detail data on tweets at a rate of roughly 4 million per day. That has increased to 8 million tweets per day by the time of publishing, and thus can be expected to be significantly higher still at present. The researchers implemented a system, named “Truthy,” to monitor the data stream from Twitter and detect relevant memes, and then produce basic features to pass to the meme classifier and visualize. Only tweets related to US politics and of sufficiently general interest where stored for use, with further filtering those with the memes of interest. 305 million tweets where tracked from September 14 to October 27, 2010, of which 1.2 million contained a political keyword, and meme filtering reduced the number to 600,000. The data is unstructured, consisting of the twitter message in text format, as well as collecting twitter metadata. The features used in identifying different memes are: Hashtags (use of #), Mentions (name prepended by @), URLs (strings of URL characters beginning with ‘http://’), and phrases, or the entire text of the tweet after removing metadata, punctuation, or URLs.

Data was analyzed by collecting statistics based on topology of the networks, and sentiment analysis of the tweets. Memes were run through classifiers to identify as truthy or legitimate, with high success rates (best classifier at >96% accuracy). The researchers identified numerous examples of astroturfing campaigns attempting to influence the 2010 US midterm elections. Most memes were found to have small diffusion networks, which is important as early termination of astroturfs is critical. If an astroturf is successful at gaining attention of the community, its spreading pattern becomes indistinguishable from an organic meme. The research overall showed a promising result, with success in detecting political astroturfing campaigns on Twitter. Further research on the subject can be done to expand across different social platforms and increase the features used in classification.

Predicting Elections using Twitter
As social media usage grows, political discourse is becoming more common, with 22% of adult internet users engaging with the 2010 US elections through social networks. Given the predictive nature of social media, claims have been made that social media data can accurately predict elections. This paper questions the predictive nature of such attempts, and argues that social media performs no better than chance at predicting elections. The accuracy of these predictions is important due to the gravity that elections carry and the increasing difficulty of conducting traditional polling. Such factors such as demographics, determining if a citizen is likely to vote, and obtaining unbiased sampling are crucial to better using social media for predictive purposes. In order to test their hypotheses, the authors attempted to replicate past techniques of using Twitter to predict electoral outcomes to see whether or not past success was due to chance.

Two separate data sets related to the 2010 US midterm elections were used. The first set contained 234,697 tweets from 56,165 users about the 2010 US Senate special election in Massachusetts (held on January 19th). This data was collected from January 13 to 20, 2010 by use of the Twitter streaming API, retrieving tweets with the names of either candidate. The second data set contained all tweets from the Twitter gardenhose API during the week of October 26 to November 1, 2010, immediately preceding the US Congressional elections held on November 2, 2010. After filtering for tweets containing names of candidates for five contested senate races, 13,019 tweets from 6,970 users were collected. The tweets are unstructured data, with the text content of the tweet being used for the research. The features used were the number of mentions received by a candidate and the polarity lexicon which identified number of positive, negative, and neutral words. Using these features, two methods of predicting the elections were done: using the volume of relative mentions for each candidate as vote share, and based on the sentiment analysis of positive and negative tweets applying a formula to compute each candidate’s vote share. These predictions were then compared to the actual election returns to assess the accuracy of each method.

Each technique correctly predicted the result in half the races, which is similar to guessing at chance. When using the first data set for the Massachusetts election, the overall Twitter volume was indicative of the election results, when looking at pre-election tweets the prediction was incorrect. The mean average error (MAE) when using the Twitter volume was 17.1% and 7.6% for the sentiment analysis, while polling has a typical MAE of around 2-3%. In order to determine why the sentiment analysis performed better, further analysis was conducted. As part of this testing, the political orientation of each user was attempted to be derived from their tweets to see if sentiment analysis can reveal a user’s political orientation. However, such analysis provided only a very weak correlation. It was determined that sentiment analysis is of poor accuracy, only performing slightly above random, while failing to account for misleading propaganda, and unable to predict political orientation of users. The authors contribute the results of the experiment to the inability of social media data to account for likeliness of voting, demographics, and suffering a voluntary response bias and lack of random sampling. Further research is needed on the use of sentiment analysis methods with regards to political conversation, and answers to the nature of political conversation on social media in order to develop a deeper understanding of the dynamics involved.

Paper Three Title
Paper Three Summary