Social genome

The social genome is the collection of data about members of a society that is captured in ever-larger and ever-more complex databases (e.g., government administrative data, operational data, social media data etc.). Some have used the term digital footprint to refer to individual traces.

History
There have been two distinct uses of the term. First, the word Social Genome was used in a letter to the editor submission to Science in response to a seminal article about using big data for social science by King. The letter was published, but the word social genome was edited out of the letter. The original submission states, “A well-integrated federated data system of administrative databases updated on an ongoing basis could hold a collective representation of our society, our social genome.” Kum and others continue to use the word since 2011, with it being defined in a peer reviewed article in 2013. It states “Today there is a constant flow of data into, out of, and between ever-larger and ever-more complex databases about people. Together, these digital traces collectively capture our social genome, the footprints of our society.” In 2014, a vision paper on population informatics was published which further elaborated on the term.

Second, separately at about the same time, a group of researchers led by the Brookings Institution started the Social Genome Project which built a data-rich model to map the pathway to the Middle class by tracing the life course from birth until middle age. The first paper was published in 2012.