User:Alexander Chou (9919)/sandbox

Graph's Theory Application in Digital Humanities

Social network analysis, which bases a lot of its concepts on graph theory, has been applied in social science for decades. In recent years, social network analysis has frequently been applied in digital humanities. Digital humanities is an interdisciplinary field that combines humanities subjects, such as history and literature, with information technology and computer science. Take history for example. When historians are trying to figure out the relationship between historical individuals, it would be helpful for them to build a social network. In humanities researches, not only it is important to study an individual’s life, but what is also important is to study how the individual interacted with others and what role the individual played in his circle. Researchers can learn those information by examining how dense the edges are and how nodes are connected in a graph.

Collecting a large amount of data is often required to build a social network. A common way to collect data is by using text-mining to extract data from digital text. Another way to collect data is by using data fields in a database. The adjacency matrix in graph theory is used in this process. The row in a matrix represents the vertices or the “subject” in a relation. The column in a matric represents the edges or the “object” in a relation. Adjacency matrixes are often represented in binary form, where 1 means two vertices are adjacent and 0 means they are not adjacent. Edges can also be assigned with attributes and weights, which helps create a more detailed and characterized graph.

One example is the Chinese Biographical Database Project. Chinese Biographical Database is a collaborative project among Academia Sinica, Harvard University, and Peking University. Chinese Biographical Database is a highly-detailed relational database containing over 470,000 biographical data of individuals, spanning across several dynasties, from 7th to 19th centuries. The database is still expanding up until today. The database not only contains the personal information of an individual, such as date and place of birth, it also shows different types of relationships the individual was involved in. These different relations allow researchers to create graphs for specific networks. The resulting graph would consist of nodes, which represent individuals, and edges that connect to dozens to hundreds of nodes. The edges have attributes to record the kinds of relations between nodes. Relations include general associations, scholarship, friendship, politics, writings, military, medicine, religion, family, and finance. Researchers can use these different relations to make a more detailed analysis of an individual’s network.

An example of the application of Chinese Biographical Database in digital humanities is a research conducted by Ya-hwei Hsu about the social networks of antiquities collectors in the late Song dynasty. The research discusses how antiquities collectors in the Song dynasty were related to each other, how different circles were related to others, and how the collector circles grew or shrink. Hsu used five books about artifact collection, which were written by five different collectors, to gather collectors’ information. Each book is being viewed as a circle. Besides the direct relation found in the book, Hsu used the Chinese Biographical Database to search if there were social relations, such as family among collectors. Then, network graphs were created for each book. In each network, two relations were being shown. One is the direct relation based on sharing the collection hobbies. The other one is the social relations among the collectors. The centrality and degree centrality of each graph were examined to analyze the result. Social network analysis is also being implemented in another digital humanities research about the analysis of best singer nominees in a music award. The research was conducted by Wei-Min Fan and Muh-Chyun Tang. In this research, the main information of the nominees being collected were the lyricists, the composers, the producers, and the arrangers. The network graphs being built in the research are directed graphs, which means the edges have directions. The directed graphs show how individuals collaborate. For example, if a lyricist wrote the song for the nominated singer, the lyricist would be the source of the edge while the singer would be the target. How many times an individual collaborated with another individual was also being taken into consideration. Therefore, the edges are also weighted. This research used analysis methods, such as average degree, path length, and other properties of graph, to examine the network of the nominees and its diversity. Mapping Republic of Letter is a project collaborated among Stanford University and Oxford University. The project aims to build the networks created by thinkers, philosophers, and other intellectuals through mails and publications from the Renaissance to the Enlightenment period. The data were from a MRL database consisting of 80,000 entities. The entities record the historical documents, information of individuals, and locations. MySQL was used in the project to optimize database management and the query process of creating graphs. The project includes several case studies on famous historical figures, such as the astronomer, Galileo Galilei. In Galilei’s case study, its network graph of correspondence shows information such as whom he contacted frequently and the geographical characteristics of his recipients. The density of the edges indicates the frequency of correspondence.

Social Network Analysis/Modelling and visualization of networks

A representing visualization of networks is Six Degrees of Francis Bacon, which is a project conducted by Carnegie Mellon University and Georgetown University. It builds a social network around Francis Bacon among philosophers, politicians, other related figures in the 16th to 17th century England. The network features over 13,000 individuals and 200,000 relationships them, with the origin of the graph being Francis Bacon. The data used to build the network were collected from the Oxford Dictionary of National Biography. Data were collected by extracting HTML text from the documents in the Oxford Dictionary of National Biography. The extracted texts were then processed by using the NER (Name Entity Recognition) tools developed by Stanford’s Natural Language Processing Group and LingPipe collection of tools. The NER tools use programs to recognize extract names mentioned in an individual’s document. Once identified, names were categorized as different types, such as persons, organizations, or locations. The researchers collected a total of 58,625 documents and 13,309 people names. A matrix of n(rows)×p(columns) was created to store the data, where rows represent the documents and the columns represent the people. The matrix stores the number of times a person’s name j appears in a document i. A conditional Poisson distribution was used to define the relationships between nodes. The network built on top of the matrix features the dates of birth and death of each node, the social role of the node, and edges with different attributes. For example, some edges show that two nodes had a relationship of rivalry while some shows family relationship. This project is not only a research outcome, but also an educational tool to help students learn the associations among historical figures and the visualizing method to analyze social networks.