User:Soper103/sandbox

This article is in development and should not yet be posted.

Title: DeepQA

DeepQA is a question and answer based machine learning technique. The technology was developed as part of IBM's Watson artificially intelligent computer system. It is important to distinguish between Watson and DeepQA. Watson is the supercomputer that was built to win Jeopardy!, where DeepQA is the machine learning technique behind Watson. DeepQA was successful in Watson, but has various uses beyond Jeopardy!. The ability to ask a computer a question and have it provide an answer is not new, but machine learning techniques like DeepQA are greatly expanding the types of questions that computers can answer, which has numerous applications in fields such as healthcare, business, and law.

History
IBM has a history of creating computers to complete challenges better than humans. The most famous example is Deep Blue, the computer that beat chess Grandmaster Garry Kasparov in a game of chess in 1997. In 2004, researchers at IBM started working on a computer that could master the game show Jeopardy!.

The challenges with designing a computer to win Jeopardy! were typical question and answer abilities, natural language processing, machine learning, distributed computing, probalistic reasoning, and hypothesis generation. IBM’s end product was Watson, a supercomputer that could beat Jeopardy! champions. By 2008, Watson had progressed enough that it could compete with human Jeopardy! contestants. By 2010, it could beat humans regularly. Watson was revealed to the world in February of 2011 when it beat Brad Rutter and Ken Jennings on television. DeepQA architecture is the basis for Watson’s question and answer abilities.

Description of Technology


The DeepQA architecture starts by taking a question and searching various ingested databases for evidence. It uses algorithms to collect this evidence and form multiple answers to the question. It then assigns a confidence value to each answer and picks the answer with the highest confidence value. The Jeopardy! shows clearly showed this by displaying Watson’s top three results and the confidence in each one. If it did not have a high enough confidence in any answer, Watson wouldn’t answer the question. Watson has access to many databases, and the information in those databases is not always correct. This is why it performs error analysis. It cross checks multiple databases in an attempt to determine the truth behind the evidence it collects. It performs all of these steps in a matter of seconds.

One of the most innovative and complex skills of DeepQA technology is the natural language processing (NLP). To be successful in Jeopardy!, Watson needed to be able to understand very complex questions. Jeopardy! categories often have puns and other phrasing that makes them very challenging to comprehend. Humans are usually able to understand the phrasing and figure out how to answer the specific categories, but it is hard to program that kind of understanding into a computer. However, there are great benefits to this type of system.

A traditional system would need to follow some type of if-then logic that has specific rules for how to process data that it encounters. However, it can be costly to maintain such a system as new knowledge is gained. With NLP, a DeepQA system uses search algorithms to analyze new, unstructured information and use it in hypothesis generation. This way, the system can stay up to date with all current information, as long as it has access to the databases with that information. NLP also helps by identifying words and phrases that are synonyms or describe the same concept.

DeepQA employs two techniques to complete NLP: english slot grammar (ESG) parsing, and predicate-argument structure (PAS) building. ESG is a technique that parses a sentence and finds multiple ways that the sentence can be interpreted. It decides which interpretation is the most likely to be correct. PAS takes this interpretation and builds a new sentence that can be interpreted by the rest of the system. For example, "The active/passive alternations such as 'John sold a fish' and 'A fish was sold by John' have slightly different structures in ESG but the same structure in PAS.

The parsed sentence is then analyzed to detect critical components. For example, named entities are identified, terms are identified that determine what kind of answer must be provided, and sub-questions are identified that may need to be answered separately. . These elements must be detected accurately for the system to have any hope of arriving at the correct answer.

Content Adaption
In order for a complex technology like DeepQA to be repurposed, it must undergo content adaption. No matter what field the DeepQA is being applied to, the adaptation process will consist of three primary steps. A system must go through these three steps to continuously improve and optimize results.
 * 1) Content adaptation – This involves getting data sets to be added and supporting hypothesis generation.
 * 2) Training adaptation – This involves having a test question set that can be used to evaluate the system. Using machine learning techniques, the system learns to extract better evidence and more accurately rank answers based on confidence levels. For Watson, this question set was made up of past Jeopardy! questions.
 * 3) Functional adaptation – This involves using natural language processing to allow the system to understand input questions. This step includes algorithms to analyze questions, score hypotheses, and generate evidence.

The original Watson system did not have a dedicated purpose. This made it very difficult to add sufficient reference sources to the system to have the knowledge to answer any Jeopardy! question. However, with more focused applications, like those described below, this is not as large of an issue. However, it is still necessary to find appropriate sources for any DeepQA system to reference. The process of adding sources to the system can be broken down into three steps:
 * 1) Source acquisition - "an iterative development process of acquiring new collections to cover salient topics deemed to be gaps in existing resources based on principled error analysis"
 * 2) Source Transformation - "the process in which information is extracted from existing sources, either as a whole or in part, and is represented in a form that the system can most easily use"
 * 3) Source Expansion - "attempts to increase the coverage in the content of each known topic by adding new information as well as lexical and syntactic variations of existing information extracted from external large collections"

Healthcare
One of the first proposed applications for DeepQA technology is in the field of Healthcare. Such a system would adapt the DeepQA technology to look at medical databases to answer medical questions and propose diagnosis. There are various other applications of DeepQA, especially in healthcare.

Business
DeepQA has a number of unique applications in business. One application would use DeepQA to search through legal documents to find laws that might help or limit the ability for a business to achieve a business goal. The system would look at business goals and search through sets of legal information to determine how various laws and regulations would impact each goal. It would also be able to look at trends in legal action to see how future legislation might affect the future of a business.

Law
One of the main abilities of DeepQA is the ability to pull evidence to support an idea. In the field of law, this would be finding evidence in legal documents to compile legal arguments to be used in court cases. Such a system would look at legal information in the case and other legal records that have been compiled (laws, legal precedents, etc.) to come up with legal arguments that will be relevant in the case.