User:Katherineheins/sandbox

=Data Science, Mood, and Behavior on Social Media=

Applications of Social Media Data
Data science is one of the most rapidly growing branches of science, and can be used for a wide variety of applications across many fields of study. For instance, Twitter data has been used to ascertain how day lengths and seasons affect people’s mood worldwide. Also, researchers have utilized Reddit data to characterize abstinence from drugs or alcohol, which could be useful for medical and policy in treating addiction, with the future possibility of early warning systems or interventions for relapse on social media. Other uses for Twitter data include matching tweets with census data to determine social connections and regional demographics, and predicting election results through tweets, as researchers attempted to do with the German federal election in 2009. Finally, there are also extremely interesting applications for social media data in the mental health field, such as using the language of tweets from users who have self-identified as having a mental illness to predict mental illness among other users, or analyzing how suicide-related content on Reddit changes after a celebrity commits suicide. All these and many more comprise the myriad ways that social media data can be harnessed to aid many different fields.


 * 1) Measuring Mood with Twitter
 * 2) Drug Abstinence on Reddit
 * 3) Demographics of Twitter Interactions



Measuring Mood with Twitter
Studies in data science can be used to determine how diurnal cycles and seasons affect individual mood. One application of Twitter in this case is to study individuals’ changes in moods, first in relation to diurnal cycles and then to seasons. Researchers can compile millions of tweets from millions of users worldwide, and then use the LIWC method to identify features and collect data from the unstructured Twitter data, and then analyze it through linear regression to demonstrate the relations between positive affect and negative affect to the time and length of day. Such research has found that there is in fact a morning rise and nighttime peak in positive affect, and a sharp drop in negative affect over the overnight hours. People generally feel happier, or have more positive affect, on the weekends, but this spikes about two hours later in the mornings than it does on weekdays, suggesting that people sleep later then. There are also disparate results for different chronotypes, meaning that “morning people” versus night owls had different moods at different times of the day. The study also discovered that mood does not vary in relation to absolute day length, but instead with whether day length is increasing or decreasing. Limitations to this field of research, however, are that there is little data available on Twitter users’ backgrounds, which can influence sleep, environmental stress, access to social support, etc., all of which can affect mood.



Drug Abstinence on Reddit
Another use of social media for data science purposes focuses on how social media language and interactions can be utilized to characterize long-term abstinence from tobacco or alcohol. Data in such work can be collected through observation of activity on two subreddit feeds called StopDrinking (SD) and StopSmoking (SS). This Reddit data is comprised of posts and comments from redditors which describe users’ self-reported information on their smoking or drinking abstinence. The data features of posts like these include whether the abstainer was classified as short term or long term, as well as linguistic characteristics, addiction vocabulary, and interactions. Another useful too in data science is learning models, which can classify users into short term or long term abstainers, and find characteristic phrases or themes associated with short term as opposed to long term abstainers. In this field, it is likely that support network on the forums will play an important role in encouraging long-term abstinence. While there are limitations to this research on Reddit based on the format of the site and the inability to collect comprehensive data on the abstainers, the goal of the work is often merely to characterize short and long term abstinence, not predict them.



Demographics of Twitter Interactions
Finally, another application of data science focuses on how psychological and demographic factors affect the structure of online social interactions on Twitter, which can affect people’s online experience and the quality of information they receive. Researchers are able to collect about millions of geo-referenced tweets, like ones from a populous region of the country such as LA County and then link the tweets to US Census tracts. They can also use Twitter mentions to measure the strength of social ties, and their relationship to socioeconomic status and emotions, and measure the spatial diversity of social relationships. Data features for these kind of tweets include positive and negative sentiment, and valence, arousal, and dominance of emotion, mention frequency of tweets, and spatial diversity. To analyze the tweets, it is possible to use the programs SentiStrength and the WKB lexicon, as well as Twitter mention frequency. Then, regression can measure social ties against both emotion and demographics. The results of this field of study demonstrate that places where people engage less frequently but with diverse contacts have more positive posts, and residents are better educated, younger, and more affluent. Stronger social ties are correlated to more negative emotions, as is lower spatial diversity, and tracts with higher numbers of Hispanic residents had stronger social ties. Here, data science and Twitter are able to identify a social inequity, since weak ties help play a role in delivering new information, and so it is possible that privileged people are in network positions that potentially allow them greater access to information than their counterparts without as much privilege.