User:Wdan14/Hallucination (artificial intelligence)

Scientific Research
Artificial Intelligence models may also be causing some problems in the world of academic and scientific research due to their hallucinations. Specifically, models like ChatGPT have been recorded in multiple cases to cite sources for information that are either not correct or do not exist. A study conducted in the Cureus Journal of Medical Science showed that out of 178 total references cited by ChatGPT, 69 returned an incorrect or nonexistent DOI. An addition 28 had no known DOI nor could be located in a Google search.

Another instance of this occurring was documented by Dr. Jerome Goddard from Mississippi State University. During a lab experiment, ChatGPT had provided him and his research team with questionable information about ticks. Unsure about the validity of the response, they inquired about the source that the information had been gathered from. Upon looking at the source, it was apparent that not only had the DOI been hallucinated, but the names of the authors as well. Some of the authors were contacted and confirmed that they had no knowledge of the papers existence whatsoever. Goddard says that, "in [ChatGPT's] current state of development, physicians and biomedical researchers should NOT ask ChatGPT for sources, references, or citations on a particular topic. Or, if they do, all such references should be carefully vetted for accuracy." Goddard expresses that the use of these language models is not ready for fields of academic research and that their use should be handled carefully.

On top of providing incorrect or missing reference material, ChatGPT also has issues with hallucinating the contents of some reference material. A study that analyzed a total of 115 references provided by ChatGPT documented that 47% of them were fabricated. Another 46% cited real references but extracted incorrect information from them. Only the remaining 7% of references were cited correctly and provided accurate information. ChatGPT has also been observed to "double-down" on a lot of the incorrect information. When you ask ChatGPT about a mistake that may have been hallucinated, sometimes it will try to correct itself but other times it will claim the response is correct and provide even more misleading information.

These hallucinated articles generated by language models also pose an issue because to a human being it can be very difficult to tell if an article is generated by an AI. To show this, a group of researchers at the Northwestern University of Chicago generated 50 abstract research reports based on existing reports and analyzed their originality. Plagiarism detectors gave the generated articles an originality score of 100%, meaning that the information presented appears to be completely original. Other software designed to detect AI generated text was only able to correctly identify these generated articles with an accuracy of 66%. Research scientists had a similar rate of human error, identifying these abstracts at a rate of 68%. From this information, the authors of this study concluded, "[t]he ethical and acceptable boundaries of ChatGPT’s use in scientific writing remain unclear, although some publishers are beginning to lay down policies." Because of AI's ability to fabricate research undetected, the use of AI in the field of research will make determining the originality of research more difficult and require new policies regulating its use in the future.

Given the ability of AI generated language to pass as real scientific research in some cases, AI Hallucinations present problems for the application of language models in the Academic and Scientific fields of research due to their ability to be undetectable when presented to real researchers. The high likely-hood of returning non-existent reference material and incorrect information may require limitations to be put in place regarding these language models. Some say that rather than hallucinations, these events are more akin to "fabrications" and "falsifications" and that the use of these language models presents a risk integrity of the field as a whole.