Generative artificial intelligence



Generative artificial intelligence (generative AI, GenAI, or GAI) is artificial intelligence capable of generating text, images, videos, or other data using generative models, often in response to prompts. Generative AI models learn the patterns and structure of their input training data and then generate new data that has similar characteristics.

Improvements in transformer-based deep neural networks, particularly large language models (LLMs), enabled an AI boom of generative AI systems in the early 2020s. These include chatbots such as ChatGPT, Copilot, Gemini and LLaMA, text-to-image artificial intelligence image generation systems such as Stable Diffusion, Midjourney and DALL-E, and text-to-video AI generators such as Sora. Companies such as OpenAI, Anthropic, Microsoft, Google, and Baidu as well as numerous smaller firms have developed generative AI models.

Generative AI has uses across a wide range of industries, including software development, healthcare, finance, entertainment, customer service, sales and marketing, art, writing, fashion, and product design. However, concerns have been raised about the potential misuse of generative AI such as cybercrime, the use of fake news or deepfakes to deceive or manipulate people, and the mass replacement of human jobs.

History
The academic discipline of artificial intelligence was established at a research workshop held at Dartmouth College in 1956 and has experienced several waves of advancement and optimism in the decades since. Since its inception, researchers in the field have raised philosophical and ethical arguments about the nature of the human mind and the consequences of creating artificial beings with human-like intelligence; these issues have previously been explored by myth, fiction and philosophy since antiquity. The concept of automated art dates back at least to the automata of ancient Greek civilization, where inventors such as Daedalus and Hero of Alexandria were described as having designed machines capable of writing text, generating sounds, and playing music. The tradition of creative automatons has flourished throughout history, exemplified by Maillardet's automaton created in the early 1800s.

Artificial Intelligence is an idea that has been captivating society since the mid-20th century. It began with science fiction familiarizing the world with the concept but the idea was not fully seen in the scientific manner until Alan Turing, a polymath, was curious about the feasibility of the concept. Turing's groundbreaking 1950 paper, "Computing Machinery and Intelligence," posed fundamental questions about machine reasoning similar to human intelligence, significantly contributing to the conceptual groundwork of AI. The development of AI was not very rapid at first because of the high costs and the fact that computers were not able to store commands. This changed during the 1956 Dartmouth Summer Research Project on AI where there was an inspiring call for AI research, setting the precedent for two decades of rapid advancements in the field.

Since the founding of AI in the 1950s, artists and researchers have used artificial intelligence to create artistic works. By the early 1970s, Harold Cohen was creating and exhibiting generative AI works created by AARON, the computer program Cohen created to generate paintings.

Markov chains have long been used to model natural languages since their development by Russian mathematician Andrey Markov in the early 20th century. Markov published his first paper on the topic in 1906, and analyzed the pattern of vowels and consonants in the novel Eugeny Onegin using Markov chains. Once a Markov chain is learned on a text corpus, it can then be used as a probabilistic text generator.

The field of machine learning often uses statistical models, including generative models, to model and predict data. Beginning in the late 2000s, the emergence of deep learning drove progress and research in image classification, speech recognition, natural language processing and other tasks. Neural networks in this era were typically trained as discriminative models, due to the difficulty of generative modeling.

In 2014, advancements such as the variational autoencoder and generative adversarial network produced the first practical deep neural networks capable of learning generative models, as opposed to discriminative ones, for complex data such as images. These deep generative models were the first to output not only class labels for images but also entire images.

In 2017, the Transformer network enabled advancements in generative models compared to older Long-Short Term Memory models, leading to the first generative pre-trained transformer (GPT), known as GPT-1, in 2018. This was followed in 2019 by GPT-2 which demonstrated the ability to generalize unsupervised to many different tasks as a Foundation model.

In 2021, the release of DALL-E, a transformer-based pixel generative model, followed by Midjourney and Stable Diffusion marked the emergence of practical high-quality artificial intelligence art from natural language prompts.

In March 2023, GPT-4 was released. A team from Microsoft Research argued that "it could reasonably be viewed as an early (yet still incomplete) version of an artificial general intelligence (AGI) system". Other scholars have disputed that GPT-4 reaches this threshold, calling generative AI "still far from reaching the benchmark of ‘general human intelligence’" as of 2023. In 2023, Meta released an AI model called ImageBind which combines data from text, images, video, thermal data, 3D data, audio, and motion which is expected to allow for more immersive generative AI content. According to a survey by SAS and Coleman Parkes Research, China is leading the world in adopting generative AI, with 83% of Chinese respondents using the technology, surpassing the global average of 54% and the U.S. at 65%. A UN report revealed China filed over 38,000 GenAI patents from 2014 to 2023, far exceeding the U.S.

Modalities
A generative AI system is constructed by applying unsupervised or self-supervised machine learning to a data set. The capabilities of a generative AI system depend on the modality or type of the data set used.

Generative AI can be either unimodal or multimodal; unimodal systems take only one type of input, whereas multimodal systems can take more than one type of input. For example, one version of OpenAI's GPT-4 accepts both text and image inputs.

Text
Generative AI systems trained on words or word tokens include GPT-3, LaMDA, LLaMA, BLOOM, GPT-4, Gemini and others (see List of large language models). They are capable of natural language processing, machine translation, and natural language generation and can be used as foundation models for other tasks. Data sets include BookCorpus, Wikipedia, and others (see List of text corpora).

Code
In addition to natural language text, large language models can be trained on programming language text, allowing them to generate source code for new computer programs. Examples include OpenAI Codex.

Images
Producing high-quality visual art is a prominent application of generative AI. Generative AI systems trained on sets of images with text captions include Imagen, DALL-E, Midjourney, Adobe Firefly, Stable Diffusion and others (see Artificial intelligence art, Generative art, and Synthetic media). They are commonly used for text-to-image generation and neural style transfer. Datasets include LAION-5B and others (see List of datasets in computer vision and image processing).

Audio
Generative AI can also be trained extensively on audio clips to produce natural-sounding speech synthesis and text-to-speech capabilities, exemplified by ElevenLabs' context-aware synthesis tools or Meta Platform's Voicebox. Generative AI systems such as MusicLM and MusicGen can also be trained on the audio waveforms of recorded music along with text annotations, in order to generate new musical samples based on text descriptions such as a calming violin melody backed by a distorted guitar riff.

Music
Audio deepfakes of lyrics have been generated, like the song Savages, which used AI to mimic rapper Jay-Z's vocals. Music artist's instrumentals and lyrics are copyrighted but their voices aren't protected from regenerative AI yet, raising a debate about whether artists should get royalties from audio deepfakes.

Many AI music generators have been created that can be generated using a text phrase, genre options, and looped libraries of bars and riffs.

Video
Generative AI trained on annotated video can generate temporally-coherent, detailed and photorealistic video clips. Examples include Sora by OpenAI, Gen-1 and Gen-2 by Runway, and Make-A-Video by Meta Platforms.

Molecules
Generative AI systems can be trained on sequences of amino acids or molecular representations such as SMILES representing DNA or proteins. These systems, such as AlphaFold, are used for protein structure prediction and drug discovery. Datasets include various biological datasets.

Robotics
Generative AI can also be trained on the motions of a robotic system to generate new trajectories for motion planning or navigation. For example, UniPi from Google Research uses prompts like "pick up blue bowl" or "wipe plate with yellow sponge" to control movements of a robot arm. Multimodal "vision-language-action" models such as Google's RT-2 can perform rudimentary reasoning in response to user prompts and visual input, such as picking up a toy dinosaur when given the prompt pick up the extinct animal at a table filled with toy animals and other objects.

Data
Generative AI systems are often used to develop synthetic data as an alternative to data produced by real-world events. Such data can be deployed to validate mathematical models and to train machine learning models while preserving user privacy, including for structured data. The approach is not limited to text generation; image generation has been employed to train computer vision models.

Computer aided design
Artificially intelligent computer-aided design (CAD) can use text-to-3D, image-to-3D, and video-to-3D to automate 3D modeling. AI-based CAD libraries could also be developed using linked open data of schematics and diagrams. AI CAD assistants are used as tools to help streamline workflow.

Planning
The terms generative AI planning or generative planning were used in the 1980s and 1990s to refer to AI planning systems, especially computer-aided process planning, used to generate sequences of actions to reach a specified goal.

Generative AI planning systems used symbolic AI methods such as state space search and constraint satisfaction and were a "relatively mature" technology by the early 1990s. They were used to generate crisis action plans for military use, process plans for manufacturing and decision plans such as in prototype autonomous spacecraft.

Software and hardware
Generative AI models are used to power chatbot products such as ChatGPT, programming tools such as GitHub Copilot, text-to-image products such as Midjourney, and text-to-video products such as Runway Gen-2. Generative AI features have been integrated into a variety of existing commercially available products such as Microsoft Office (Microsoft Copilot), Google Photos, and the Adobe Suite (Adobe Firefly). Many generative AI models are also available as open-source software, including Stable Diffusion and the LLaMA language model.

Smaller generative AI models with up to a few billion parameters can run on smartphones, embedded devices, and personal computers. For example, LLaMA-7B (a version with 7 billion parameters) can run on a Raspberry Pi 4 and one version of Stable Diffusion can run on an iPhone 11.

Larger models with tens of billions of parameters can run on laptop or desktop computers. To achieve an acceptable speed, models of this size may require accelerators such as the GPU chips produced by NVIDIA and AMD or the Neural Engine included in Apple silicon products. For example, the 65 billion parameter version of LLaMA can be configured to run on a desktop PC.

The advantages of running generative AI locally include protection of privacy and intellectual property, and avoidance of rate limiting and censorship. The subreddit r/LocalLLaMA in particular focuses on using consumer-grade gaming graphics cards through such techniques as compression. That forum is one of only two sources Andrej Karpathy trusts for language model benchmarks. Yann LeCun has advocated open-source models for their value to vertical applications and for improving AI safety.

Language models with hundreds of billions of parameters, such as GPT-4 or PaLM, typically run on datacenter computers equipped with arrays of GPUs (such as NVIDIA's H100) or AI accelerator chips (such as Google's TPU). These very large models are typically accessed as cloud services over the Internet.

In 2022, the United States New Export Controls on Advanced Computing and Semiconductors to China imposed restrictions on exports to China of GPU and AI accelerator chips used for generative AI. Chips such as the NVIDIA A800 and the Biren Technology BR104 were developed to meet the requirements of the sanctions.

There is free software on the market capable of recognizing text generated by generative artificial intelligence (such as GPTZero), as well as images, audio or video coming from it. Despite claims of accuracy, both free and paid AI text detectors have frequently produced false positives, mistakenly accusing students of submitting AI-generated work.

Law and regulation
In the United States, a group of companies including OpenAI, Alphabet, and Meta signed a voluntary agreement with the Biden administration in July 2023 to watermark AI-generated content. In October 2023, Executive Order 14110 applied the Defense Production Act to require all US companies to report information to the federal government when training certain high-impact AI models.

In the European Union, the proposed Artificial Intelligence Act includes requirements to disclose copyrighted material used to train generative AI systems, and to label any AI-generated output as such.

In China, the Interim Measures for the Management of Generative AI Services introduced by the Cyberspace Administration of China regulates any public-facing generative AI. It includes requirements to watermark generated images or videos, regulations on training data and label quality, restrictions on personal data collection, and a guideline that generative AI must "adhere to socialist core values".

Training with copyrighted content
Generative AI systems such as ChatGPT and Midjourney are trained on large, publicly available datasets that include copyrighted works. AI developers have argued that such training is protected under fair use, while copyright holders have argued that it infringes their rights.

Proponents of fair use training have argued that it is a transformative use and does not involve making copies of copyrighted works available to the public. Critics have argued that image generators such as Midjourney can create nearly-identical copies of some copyrighted images, and that generative AI programs compete with the content they are trained on.

As of 2024, several lawsuits related to the use of copyrighted material in training are ongoing. Getty Images has sued Stability AI over the use of its images to train Stable diffusion. Both the Authors Guild and The New York Times have sued Microsoft and OpenAI over the use of their works to train ChatGPT.

Copyright of AI-generated content
A separate question is whether AI-generated works can qualify for copyright protection. The United States Copyright Office has ruled that works created by artificial intelligence without any human input cannot be copyrighted, because they lack human authorship. However, the office has also begun taking public input to determine if these rules need to be refined for generative AI.

Concerns
The development of generative AI has raised concerns from governments, businesses, and individuals, resulting in protests, legal actions, calls to pause AI experiments, and actions by multiple governments. In a July 2023 briefing of the United Nations Security Council, Secretary-General António Guterres stated "Generative AI has enormous potential for good and evil at scale", that AI may "turbocharge global development" and contribute between $10 and $15 trillion to the global economy by 2030, but that its malicious use "could cause horrific levels of death and destruction, widespread trauma, and deep psychological damage on an unimaginable scale".

Job losses
From the early days of the development of AI, there have been arguments put forward by ELIZA creator Joseph Weizenbaum and others about whether tasks that can be done by computers actually should be done by them, given the difference between computers and humans, and between quantitative calculations and qualitative, value-based judgements. In April 2023, it was reported that image generation AI has resulted in 70% of the jobs for video game illustrators in China being lost. In July 2023, developments in generative AI contributed to the 2023 Hollywood labor disputes. Fran Drescher, president of the Screen Actors Guild, declared that "artificial intelligence poses an existential threat to creative professions" during the 2023 SAG-AFTRA strike. Voice generation AI has been seen as a potential challenge to the voice acting sector.

The intersection of AI and employment concerns among underrepresented groups globally remains a critical facet. While AI promises efficiency enhancements and skill acquisition, concerns about job displacement and biased recruiting processes persist among these groups, as outlined in surveys by Fast Company. To leverage AI for a more equitable society, proactive steps encompass mitigating biases, advocating transparency, respecting privacy and consent, and embracing diverse teams and ethical considerations. Strategies involve redirecting policy emphasis on regulation, inclusive design, and education's potential for personalized teaching to maximize benefits while minimizing harms.

Racial and gender bias
Generative AI models can reflect and amplify any cultural bias present in the underlying data. For example, a language model might assume that doctors and judges are male, and that secretaries or nurses are female, if those biases are common in the training data. Similarly, an image model prompted with the text "a photo of a CEO" might disproportionately generate images of white male CEOs, if trained on a racially biased data set. A number of methods for mitigating bias have been attempted, such as altering input prompts and reweighting training data.

Deepfakes
Deepfakes (a portmanteau of "deep learning" and "fake" ) are AI-generated media that take a person in an existing image or video and replace them with someone else's likeness using artificial neural networks. Deepfakes have garnered widespread attention and concerns for their uses in deepfake celebrity pornographic videos, revenge porn, fake news, hoaxes, health disinformation, and financial fraud. This has elicited responses from both industry and government to detect and limit their use.

In July 2023, the fact-checking company Logically found that the popular generative AI models Midjourney, DALL-E 2 and Stable Diffusion would produce plausible disinformation images when prompted to do so, such as images of electoral fraud in the United States and Muslim women supporting India's Hindu nationalist Bharatiya Janata Party.

In April 2024, a paper proposed to use blockchain (distributed ledger technology) to promote "transparency, verifiability, and decentralization in AI development and usage".

Audio deepfakes
Instances of users abusing software to generate controversial statements in the vocal style of celebrities, public officials, and other famous individuals have raised ethical concerns over voice generation AI. In response, companies such as ElevenLabs have stated that they would work on mitigating potential abuse through safeguards and identity verification.

Concerns and fandom have spawned from AI-generated music. The same software used to clone voices has been used on famous musicians' voices to create songs that mimic their voices, gaining both tremendous popularity and criticism. Similar techniques have also been used to create improved quality or full-length versions of songs that have been leaked or have yet to be released.

Generative AI has also been used to create new digital artist personalities, with some of these receiving enough attention to receive record deals at major labels. The developers of these virtual artists have also faced their fair share of criticism for their personified programs, including backlash for "dehumanizing" an artform, and also creating artists which create unrealistic or immoral appeals to their audiences.

Cybercrime
Generative AI's ability to create realistic fake content has been exploited in numerous types of cybercrime, including phishing scams. Deepfake video and audio have been used to create disinformation and fraud. Former Google fraud czar Shuman Ghosemajumder has predicted that while deepfake videos initially created a stir in the media, they would soon become commonplace, and as a result, more dangerous. Additionally, large-language models and other forms of text-generation AI have been at a broad scale to create fake reviews on e-commerce websites to boost ratings. Cybercriminals have created large language models focused on fraud, including WormGPT and FraudGPT.

Recent research done in 2023 has revealed that generative AI has weaknesses that can be manipulated by criminals to extract harmful information bypassing ethical safeguards. The study presents example attacks done on ChatGPT including Jailbreaks and reverse psychology. Additionally, malicious individuals can use ChatGPT for social engineering attacks and phishing attacks, revealing the harmful side of these technologies.

Reliance on industry giants
Training frontier AI models requires an enormous amount of computing power. Usually only Big Tech companies have the financial resources to make such investments. Smaller startups such as Cohere and OpenAI end up buying access to data centers from Google and Microsoft respectively.

Misuse in journalism
In January 2023, Futurism.com broke the story that CNET had been using an undisclosed internal AI tool to write at least 77 of its stories; after the news broke, CNET posted corrections to 41 of the stories.

In April 2023, the German tabloid Die Aktuelle published a fake AI-generated interview with former racing driver Michael Schumacher, who had not made any public appearances since 2013 after sustaining a brain injury in a skiing accident. The story included two possible disclosures: the cover included the line "deceptively real", and the interview included an acknowledgment at the end that it was AI-generated. The editor-in-chief was fired shortly thereafter amid the controversy.

Other outlets that have published articles whose content and/or byline have been confirmed or suspected to be created by generative AI models – often with false content, errors, and/or non-disclosure of generative AI use - include:


 * NewsBreak
 * outlets owned by Arena Group
 * Sports Illustrated
 * TheStreet
 * Men's Journal
 * B&H Photo
 * outlets owned by Gannett
 * The Columbus Dispatch
 * Reviewed
 * USA Today
 * MSN
 * News Corp
 * outlets owned by G/O Media
 * Gizmodo
 * Jalopnik
 * A.V. Club
 * The Irish Times
 * outlets owned by Red Ventures
 * Bankrate
 * BuzzFeed
 * Newsweek
 * Hoodline
 * outlets owned by Outside Inc.
 * Yoga Journal
 * Backpacker
 * Clean Eating
 * Hollywood Life
 * Us Weekly
 * The Los Angeles Times
 * outlets owned by McClatchy
 * Miami Herald
 * Sacramento Bee
 * Tacoma News Tribune
 * The Rock Hill Herald
 * The Modesto Bee
 * Fort Worth Star-Telegram
 * Merced Sun-Star
 * Ledger-Enquirer
 * The Kansas City Star
 * Raleigh News & Observer
 * outlets owned by Ziff Davis
 * PC Magazine
 * Mashable
 * AskMen
 * outlets owned by Hearst
 * Good Housekeeping
 * outlets owned by IAC Inc.
 * People 
 * Parents 
 * Food & Wine 
 * InStyle
 * Real Simple
 * Travel + Leisure 
 * Better Homes & Gardens
 * Southern Living 

In May 2024, Futurism noted that a content management system video by AdVon Commerce, who had used generative AI to produce articles for many of the aforementioned outlets, appeared to show that they "had produced tens of thousands of articles for more than 150 publishers." 

News broadcasters in Kuwait, Greece, South Korea, India, China and Taiwan have presented news with anchors based on Generative AI models, prompting concerns about job losses for human anchors and audience trust in news that has historically been influenced by parasocial relationships with broadcasters, content creators or social media influencers. Algorithmically-generated anchors have also been used by allies of ISIS for their broadcasts.

In 2023, Google reportedly pitched a tool to news outlets that claimed to "produce news stories" based on input data provided, such as "details of current events". Some news company executives who viewed the pitch described it as "[taking] for granted the effort that went into producing accurate and artful news stories."

In February 2024, Google launched a program to pay small publishers to write three articles per day using a beta generative AI model. The program does not require the knowledge or consent of the websites that the publishers are using as sources, nor does it require the published articles to be labeled as being created or assisted by these models.

Many defunct news sites (The Hairpin, The Frisky, Apple Daily) have undergone cybersquatting, with articles created by generative AI.

United States Senators Richard Blumenthal and Amy Klobuchar have expressed concern that generative AI could have a harmful impact on local news. In July 2023, OpenAI partnered with the American Journalism Project to fund local news outlets for experimenting with generative AI, with Axios noting the possibility of generative AI companies creating a dependency for these news outlets.

Meta AI, a chatbot based on Llama 3 which summarizes news stories, was noted by The Washington Post to copy sentences from those stories without direct attribution and to potentially further decrease the traffic of online news outlets.

In response to potential pitfalls around the use and misuse of generative AI in journalism, outlets such as Wired, Associated Press and The Guardian have published guidelines around how they plan to use and not use generative AI in their work.

In June 2024, Reuters Institute published their Digital New Report for 2024. In a survey of people in America and Europe, Reuters Institute reports that 52% and 47% respectively are uncomfortable with news produced by "mostly AI with some human oversight", and 23% and 15% respectively report being comfortable. 42% of Americans and 33% of Europeans reported that they were comfortable with news produced by "mainly human with some help from AI". The results of global surveys reported that people were more uncomfortable with news topics including politics (46%), crime (43%), and local news (37%) produced by AI than other news topics.