User:Hckum/Population Informatics

The field of population informatics is the systematic study of populations via secondary analysis of massive data collections (termed “big data”) about people. Scientists in the field refer to this massive data collection as the social genome, denoting the collective digital footprint of our society. Population informatics applies data science to social genome data to answer fundamental questions about human society and population health much like bioinformatics applies data science to human genome data to answer questions about individual health. It is an emerging research area at the intersection of SBEH (Social, Behavioral, Economic, & Health) sciences, computer science, and statistics in which quantitative methods and computational tools are used to answer fundamental questions about our society.

History
The term was first used in August 2012 when the [ http://research.tamhsc.edu/pinformatics/about-us/#history &#x20;Population Informatics Research Group] was founded at the University of North Carolina at Chapel Hill. The term was first defined in a peer reviewed article in 2013 and further elaborated on in another article in 2014. The first [ http://dmm.anu.edu.au/popinfo2015/ &#x20;Workshop&#x20;on&#x20;Population&#x20;Informatics&#x20;for&#x20;Big&#x20;Data] was held at the ACM SIGKDD conference in Sydney, Australia, in August 2015.

Goals
To study social, behavioral, economic, and health sciences using the massive data collections, aka social genome data, about people. The primary goal of population informatics is to increase the understanding of social processes by developing and applying computationally intensive techniques to the social genome data.

Some of the important sub-disciplines are :
 * Business analytics
 * Social computing: social network data analysis
 * Policy informatics
 * Public health informatics
 * Computational journalism
 * Computational transportation science
 * Computational epidemiology
 * Computational economics
 * Computational sociology
 * Computational social science

Approaches
Record Linkage, the task of finding records in a data set that refer to the same entity across different data sources, is a major activity in the population informatics field because most of the digital traces about people are fragmented in many heterogeneous databases that need to be linked before analysis can be done.

Once relevant datasets are linked, the next task is usually to develop valid meaningful measures  to answer the research question. Often developing measures involves iterating between inductive and deductive approaches with the data and research question until usable measures are developed because the data were collected for other purposes with no intended use to answer the question at hand. Developing meaningful and useful measures from existing data is a major challenge in many research projects. In computation fields, these measures are often called features.

Finally, with the datasets linked and required measures developed, the analytic dataset is ready for analysis. Common analysis methods include traditional hypothesis driven research as well more inductive approaches such as data science and predictive analytics.

Relation to other fields
Computational social science refers to the academic sub-disciplines concerned with computational approaches to the social sciences. This means that computers are used to model, simulate, and analyze social phenomena. Fields include computational economics and computational sociology. The seminal article on computational social science is by Lazer et al. 2009 which was a summary of a workshop held at Harvard with the same title. However, the article does not define the term computational social science precisely.

In general, computational social science is a broader field and encompasses population informatics. Besides population informatics, it also includes complex simulations of social phenomena. Often complex simulation models use results from population informatics to configure with real world parameters.

Population reconstruction is the multi-disciplinary field to reconstruct specific (historical) populations by linking data from diverse sources, leading to rich novel resources for study by social scientists.

Related Groups & Workshops
The [ http://dmm.anu.edu.au/popinfo2015/ &#x20;First&#x20;workshop first workshop on Population Informatics for Big Data] was held at the ACM SIGKDD conference in Sydney, Australia, in 2015. The workshop brought together computer science researchers, as well as public health practitioners and researchers. This wikipedia started at the workshop.

The [ http://www.ihdln.org/ &#x20;International&#x20;Population&#x20;Data&#x20;Linkage&#x20;Network (IPDLN)] facilitates communication between centres that specialize in data linkage and users of the linked data. The producers and users alike are committed to the systematic application of data linkage to produce community benefit in the population and health-related domains.

Challenges
Three major challenges specific to population informatics are (1) the need for privacy protection of the subjects of the data, (2) the need for error bounds on the results so that real decisions that have direct impact on people can be made based on these results, and (3) scalability.