User:CCLevy/Knowledge graph

A knowledge graph is a knowledge base that uses a graph-structured data model. Knowledge graphs are often used to store interlinked descriptions of entities — real-world objects, events, situations or abstract concepts — with free-form semantics, not fitting into a single traditional ontology.

Since the development of the Semantic Web, knowledge graphs are often associated with linked open data projects, focusing on the connections between concepts and entities. The are also prominently associated with and used by search engines such as Google, Bing, and Yahoo; knowledge-engines and question-answering services such as WolframAlpha, Apple's Siri, and Amazon Alexa; and social networks such as LinkedIn and Facebook.

History
The term was coined as early as 1972, in a discussion of how to build modular instructional systems for courses. In the late 1980s, Groningen and Twente universities jointly began a project called Knowledge Graphs, focusing on the design of semantic networks with edges restricted to a limited set of relations, to facilitate algebras on the graph. In subsequent decades, the distinction between semantic networks and knowledge graphs was blurred.

Some early knowledge graphs were topic-specific. In 1985, Wordnet was founded, capturing semantic relationships between words and meanings -- an application of this idea to language itself. In 2005, Marc Wirk founded Geonames to capture relationships between different geographic names and locales and associated entities.

In 2007, both DBpedia and Freebase were founded as graph-based knowledge repositories for general-purpose knowledge. DBpedia focused exclusively on data extracted from Wikipedia, while Freebase also included a range of public datasets. Neither described themselves as a 'knowledge graph' but developed and described related concepts.

In 2012, Google introduced their Knowledge Graph, building on DBpedia and Freebase among other sources. They later incorporated RDFa and microdata formats from indexed web pages, which in time were standardized around vocabularies published by schema.org. The Google Knowledge Graph became a successful complement to string-based search within Google, and its popularity online brought the term into more common use.

Definitions
There is no single commonly accepted definition of a knowledge graph. Popular definitions include:


 * General structure: A large network of entities, their semantic types, properties, and relationships.
 * Supporting reasoning over inferred ontologies: A knowledge graph acquires and integrates information into an ontology and applies a reasoner to derive new knowledge.
 * Flexible relations among knowledge in topical domains: A knowledge graph  (i) mainly describes real world entities and their interrelations, organized in a graph,   (ii) defines possible classes and relations of entities in a schema,   (iii) allows for potentially interrelating arbitrary entities with each other, and   (iv) covers various topical domains.

Implementations
In addition to the above examples, the term has been used to describe open knowledge projects such as YAGO and Wikidata; federations like the Linked Open Data cloud; a range of commercial search tools, including Yahoo’s semantic search assistant Spark, Google’s Knowledge Vault, and Microsoft’s Satori; and the LinkedIn and Facebook entity graphs.

Using a knowledge graph for reasoning over data
In the case of integrating supplemental data source, a knowledge graph formally represents the meaning involved in information by describing concepts, relationships between things, and categories of things. This supports data inference through connected relations, instead of repeated searching of tables in a relational database.

In machine learning, knowledge graphs can help find latent connections or augment a dataset with other connections between entities.

The benefits of using a knowledge graph
In the case of integrating supplemental data source,
 * A KG formally represents the meaning involved in information by describing concepts, relationships between things, and categories of things. These embedded semantics with the data offer significant advantages such as reasoning over data and dealing with heterogeneous data sources. The rules can be applied on KG more efficiently using graph query. For example, the graph query does the data inference through the connected relations, instead of repeated full search of the tables in relational database. KG facilitates the integration of new heterogeneous data by just adding new relationships between existing information and new entities. This facilitation is emphasized for the integration with existing popular linked open data source such as Wikidata.org.
 * An SQL query is tightly coupled and rigidly constrained by datatype within the specific database. It can join tables and extract data from tables. The result is generally a table. A query can join tables by any columns which match by datatype. A SPARQL query is the standard query language and protocol for Linked Open Data on the Web. It is only loosely coupled with the database so that it facilitates the reusability and can extract data through the relations free from the datatype, and not only extract but also generate additional knowledge graph with more sophisticated operations (logic: transitive/symmetric/inverseOf/functional). The inference based query (query on the existing asserted facts without the generation of new facts by logic) can be fast comparing to the reasoning based query (query on the existing plus the generated/discovered facts based on logic).
 * The information integration of heterogeneous data sources in traditional database is intricate, which requires the redesign of the database table such as changing the structure and/or addition of new data. In the case of semantic query, a SPARQL query reflects the relationships between entities in a way that is aligned with human's understanding of the domain, so the semantic intention of the query can be seen on the query itself. Unlike SPARQL an SQL query reflects the specific structure of the database and is derived from matching the relevant primary and foreign keys of tables. Thereby, it loses the semantics of the query by missing the relationships between entities.

In the case of reinforcing targeting algorithm based on machine learning,
 * A KG helps to find latent connections among items: improving of the precision; and a KG helps to identify a user's intention which was hidden only by the ML output: It brings the explainability to the targeting system.
 * A KG helps to extending a user's interests reasonably using various relation types: increasing of the diversity; and KG helps to generate different knowledge presentations oriented by interested items: augmenting the dataset with the distance values between entities.