Citation graph



A citation graph (or citation network), in information science and bibliometrics, is a directed graph that describes the citations within a collection of documents.

Each vertex (or node) in the graph represents a document in the collection, and each edge is directed from one document toward another that it cites (or vice versa depending on the specific implementation).

Citation graphs have been utilised in various ways, including forms of citation analysis, academic search tools and court judgements. They are predicted to become more relevant and useful in the future as the body of published research grows.

Implementation
There is no standard format for the citations in bibliographies, and the record linkage of citations can be a time-consuming and complicated process. Furthermore, citation errors can occur at any stage of the publishing process. However, there is a long history of creating citation databases, also known as citation indexes, so there is a lot of information about such problems.

In principle, each document should have a unique publication date and can only refer to earlier documents. This means that an ideal citation graph is not only directed but acyclic; that is, there are no loops in the graph. This is not always the case in practice, since an academic paper goes through several versions in the publishing process. The timing of asynchronous updates to bibliographies may lead to edges that apparently point backward in time. Such "backward" citations seem to constitute less than 1% of the total number of links.

As citation links are meant to be permanent, the bulk of a citation graph should be static, and only the leading edge of the graph should change. Exceptions might occur when papers are withdrawn from circulation.

Background and history
A citation is a reference to a published or unpublished source (not always the original source). More precisely, a citation is an abbreviated alphanumeric expression embedded in the body of an intellectual work that denotes an entry in the bibliographic references section of the work. Its purpose is to acknowledge the relevance of the works of others to the topic of discussion at the point where the citation appears.

Generally the combination of both the in-body citation and the bibliographic entry constitutes what is commonly thought of as a citation (whereas bibliographic entries by themselves are not). References to single, machine-readable assertions in electronic scientific articles are known as nanopublications, a form of micro attributions.

Citation networks are one kind of social network that has been studied quantitatively almost from the moment citation databases first became available. In 1965, Derek J. de Solla Price described the inherent linking characteristic of the Science Citation Index (SCI) in his paper entitled "Networks of Scientific Papers." The links between citing and cited papers became dynamic when the SCI began to be published online. In 1973, Henry Small published his work on co-citation analysis, which became a self-organizing classification system that led to document clustering experiments and eventually what is called "Research Reviews."

Citation Analysis
Citation graphs can be applied to measures of scholarly impact, the impact a particular paper has had on the academic world. While a hard value to quantify, scholarly impact is useful, as having a measure of scholarly impact for many papers can aid in identifying important papers. It can also provide a measure of the relevance of a particular academic community. Citation graphs are very useful in measuring this as the number of connections on the citation graph corresponds with the scholarly impact of an article, as this means it has been cited by many other papers.

Similarity analysis is another area of citation analysis which frequently makes uses of citation graphs. The relationship between two papers in the citation graph has been compared to their text-based similarity, and it is found that closeness in the citation graph can predict a level of text-based similarity. Additionally, it has been found that the two methods – citation graph closeness and traditional content-based similarity – work well in conjunction to produce a more accurate result.

Analyses of citation graphs have also led to the proposal of the citation graph as a way to identify different communities and research areas within the academic world. It has been found that analysing the citation graph for groups of documents in conjunction with keywords can provide an accurate way to identify clusters of similar research. In a similar vein, a way of identifying the main “stream” of an area of research, or the progression of a research idea over time can be identified by using depth first search algorithms on the citation graph. Instead of looking at similarity between two nodes, or clusters of many nodes, this method instead goes through the links between nodes to trace a research idea back to its beginning, and so discover its progression through different papers to where its current status is.

Search Tools
The traditional method used by academic search tools is to check for matches between a search term and keywords in papers to return potential matches. While mostly effective, this method can lead to errors where a paper is recommended from a different discipline because of keyword matches even when the two topics actually have little in common.

Many have argued that this way of searching for relevant papers could be improved and made more accurate if citation graphs were incorporated into academic paper search tools. For example, one system was proposed which used both the keyword system and a popularity system based on how many connections a paper had in the citation graph. In this system, more connected papers were considered more popular and therefore given a higher weighting in the paper recommendation system.

In more recent years, visual search tools have been developed which use citation graphs to provide a visual representation of the connections between papers. A notable pioneer in this concept is the search tool Connected Papers, which began as a small project between friends and was released to the public in 2020. Given one academic paper, it analyses tens of thousands of other papers, and selecting all those relevant to the origin paper creates a citation graph and returns a visual representation of it to the viewer. This unique way of looking at research allows the viewer to see an entire area of research at a glance and can greatly aid in understanding the state of a research area and quickly identifying key papers that have lots of connections.

Court Judgements
Citation graphs have a history of being used to aid in organising and mapping citations of legal documents. In a similar way to the aforementioned search tools, constructions of citation graphs specific to the types of citations found in legal documents have been used to allow relevant past legal documents to be found when needed for a court decision. As a way of replacing or improving upon traditional search methods, this citation graph aided way of organising legal documents can provide higher efficiency, accuracy, and organisation.

Related networks
There are several other types of network graphs that are closely related to citation networks. The co-citation graph is the graph between documents as nodes, where two documents are connected if they share a common citation (see Co-citation and Bibliographic coupling). Other related networks are formed using other information present in the document. For instance, in a collaboration graph, known in this context as a co-authorship network, the nodes are the authors of documents, linked if they have co-authored the same document. The link weights between two authors in co-authorship networks can increase over time if they have further collaboration.

Future Developments
While citation graphs have had a noticeable impact on several areas of academia, they are likely to become more relevant in the future. As the body of published research grows, more traditional ways of searching for papers will become less effective in narrowing down relevant papers to a particular topic. For example, text-based similarity can only go so far in selecting which papers are relevant to a topic, whereas the addition of citation graphs could make use of giving higher priority to those papers which have a lot of connections to other papers relevant to the topic.

However, developments like this face similar challenges to that of most applications of citation graphs, which is the face that there is no standardized format or way of citing. This makes the construction of these graphs very difficult, since it requires complex software analysis to extract citations from papers. One solution proposed to this problem is to create open databases of citation information in a format which could be used by anyone and easily converted to a different form, for example a citation graph.