User:EarnestlyFrank

Public Health and Social Media
Social media data analysis has been used in a variety of fields to gain insights from the massive quantities of data produced by users. Posts by Reddit users have been analyzed for the most effective language to inspire altruism, tweets have been analyzed following disasters to determine real-time crisis maps, and sentiment analysis of tweets have been used to predict stock market performance. In addition to those applications, social media analysis has been used in public health research in three crucial areas:


 * 1) Syndromic Surveillance
 * 2) Drug Use
 * 3) HIV Risk-factors

Social Media and Syndromic Surveillance


Syndromic surveillance is a style of health monitoring in which instances of clinical syndromes in a whole population are tracked. Rather than specifically tracking an illness, it looks at the collective symptoms of a population and predicts the illnesses from that data. It can be used for improving the allocation of medical resources, inform health policy, and educate the public on diseases. Because of the breadth of data required, it’s not feasible for traditional health care facilities to collect this data. This has led to the development of “infodemiology”, where the massive amount of existing online data is used to track symptoms in real time. Google Flu Trends is able to accurately predict rates of influenza and Lyme disease based on search queries. Recently, Twitter has been implemented to track symptoms based on users’ tweets using a specially created tool called the Ailment Topic Aspect Model (ATAM) and a version with modifications to include keywords related to specific ailments (ATAM+). From tests, these tools were able to closely match CDC records for widespread illnesses, records for specific geographic areas, and over time for seasonal ailments. However, work can still be done to improve the model to track the progress of an individual’s experience with an illness, finer geographic tagging, and differentiating tweets expressing sympathy from tweets about actually having the ailment.

Social Media and Drug Use


National surveys relating to drug use are unable to provide data on a basis more regular than annually and also are unable to detect emerging drugs. In order to collect real-time data on known drugs as well as new, emerging drugs, online forums such as http://drugs-forum.com have been used to collect this data in the form of unstructured data. Users on the forum are required to provide a nation of origin, gender, and age range, allowing for analysis of specific populations to occur. Keyword frequency analysis and the date of posts were collected and compared to existing data to prove a significant correlation for the majority of drugs, indicating the potential for continued use of this technique to automate real-time analysis and for early detection of emerging drugs. However, the techniques would improve if merged with other databases such as emergency rooms, poison control calls, and other related services. Additionally, the anonymity prevents finer demographic data as well as calculating how accurately the online forum represents the overall population. Regardless of these limitations, this technique is a significant improvement over the annual national surveys and can offer new insights and be further improved.

Social Media and HIV Risk-Factors


Despite a number of highly effective preventative techniques, HIV transmission rates are still high for certain high-risk populations. In order to better track transmissions and possibly prevent them, geo-tagged tweets can be used in conjunction with county-level demographic data to determine real-time rates of transmissions based on mentions of drug-use and risky sexual activities. Analysis of this Twitter data was found to have a highly significant positive relationship to the existing county-level data. From these results, targeted geographic efforts can be made to educate groups on prevention as well as determine the effect of community health centers and needle exchanges or identify areas that would benefit the most from these resources.