User:Stellaathena/sandbox eai

EleutherAI is a grass-roots non-profit artificial intelligence (AI) research group. The research group, considered an open source version of OpenAI, conducts research in the field of AI with the stated goal of promoting and developing friendly AI in a way that benefits humanity as a whole. The group was formed in a Discord server in July 2020, two years before it was officially incorporated. Despite a lack of formal funding or organizational structure, it rapidly became a leading player in natural language processing research, releasing the largest open-source GPT-3-like model in the world March 21, 2021.

History
EleutherAI began as a Discord server on July 7, 2020 under the tentative name ``LibreAI before rebranding to ``EleutherAI later that month.

On December 30, 2020, EleutherAI released the Pile, a curated dataset of diverse text for training large language models. . While the paper referenced the existence of the GPT-Neo models, the models themselves were not released until March 21, 2021. According to a retrospective written several months later, the authors did not anticipate that "people would care so much about our "small models.""

On July, 2020, EleutherAI released the Pile, a curated dataset of diverse text for training large language models. . While the paper referenced the existence of the GPT-Neo models, the models themselves were not released until March 21, 2021. According to a retrospective written several months later, the authors did not anticipate that "people would care so much about our "small models.""

Following the release of DALL-E by OpenAI in January 2021, EleutherAI started working on text-to-image synthesis models. When OpenAI didn't release DALL-E publicly, EleutherAI's Katherine Crowson and digital artist Ryan Murdock developed a technique for using CLIP (another model developed by OpenAI) to convert regular image generation models into text-to-image synthesis ones. Building on ideas dating back to Google's DeepDream, they found their first major success combining CLIP with another publicly available model called VQGAN. Crowson released the technology by tweeting notebooks demonstrating the technique that people could run for free without any special equipment.

Motives
Some scientists, such as Stephen Hawking and Stuart Russell, have articulated concerns that if advanced AI someday gains the ability to re-design itself at an ever-increasing rate, an unstoppable "intelligence explosion" could lead to human extinction. Musk characterizes AI as humanity's "biggest existential threat." OpenAI's founders structured it as a non-profit so that they could focus its research on creating a positive long-term human impact.

Musk and Altman have stated they are partly motivated by concerns about the existential risk from artificial general intelligence. OpenAI states that "it's hard to fathom how much human-level AI could benefit society," and that it is equally difficult to comprehend "how much it could damage society if built or used incorrectly". Research on safety cannot safely be postponed: "because of AI's surprising history, it's hard to predict when human-level AI might come within reach." OpenAI states that AI "should be an extension of individual human wills and, in the spirit of liberty, as broadly and evenly distributed as possible...". Co-chair Sam Altman expects the decades-long project to surpass human intelligence.

Vishal Sikka, former CEO of Infosys, stated that an "openness" where the endeavor would "produce results generally in the greater interest of humanity" was a fundamental requirement for his support, and that OpenAI "aligns very nicely with our long-held values" and their "endeavor to do purposeful work". Cade Metz of Wired suggests that corporations such as Amazon may be motivated by a desire to use open-source software and data to level the playing field against corporations such as Google and Facebook that own enormous supplies of proprietary data. Altman states that Y Combinator companies will share their data with OpenAI.

In 2019, OpenAI became a for-profit company called OpenAI LP to secure additional funding while staying controlled by a non-profit called OpenAI Inc in a structure that OpenAI calls "capped-profit", having previously been a 501(c)(3) nonprofit organization.

Research
EleutherAI's research tends to focus on large scale generative models (text, text-to-image) and interpretability and alignment of such models.

GPT-3 Replications
EleutherAI's most prominent research relates to its work to train open source Large Language Models inspired by OpenAI's GPT-3. EleutherAI's "GPT-Neo" model series has released 125 million, 1.3 billion, 2.7 billion, 6 billion, and 20 billion parameter models.


 * GPT-Neo (125M, 1.3B, 2.7B) : released in March 2021, it was the largest open source GPT-3-style language model in the world at the time of release.


 * GPT-J (6B) : released in March 2021, it was the largest open source GPT-3-style language model in the world at the time of release.


 * GPT-NeoX (20B) : released in February 2022, it was the largest open source language model in the world at the time of release.

While the overwhelming majority of large language models are trained in either English or Chinese, EleutherAI also trains language models in other languages such as Polyglot-Ko, trained in collaboration with the Korean NLP company TUNiB.

CLIP-Guided Image Generation
When OpenAI released

The Pile
The Pile is a 800 GiB dataset designed for training large language models. It was originally developed to train EleutherAI's GPT-Neo models, but has become widely used to train models including by researchers at Microsoft,  Meta AI , Stanford University , and the Beijing Academy of Artificial Intelligence. Compared to other datasets the Pile's main distinguishing features are that it is a curated selection of data chosen by researchers at EleutherAI to contain information they thought language models should learn, and it is the only such dataset that is thoroughly documented by the researchers who developed it.

OpenAI Five is the name of a team of five OpenAI-curated bots that are used in the competitive five-on-five video game Dota 2, who learn to play against human players at a high skill level entirely through trial-and-error algorithms. Before becoming a team of five, the first public demonstration occurred at The International 2017, the annual premiere championship tournament for the game, where Dendi, a professional Ukrainian player, lost against a bot in a live 1v1 matchup. After the match, CTO Greg Brockman explained that the bot had learned by playing against itself for two weeks of real time, and that the learning software was a step in the direction of creating software that can handle complex tasks like a surgeon. The system uses a form of reinforcement learning, as the bots learn over time by playing against themselves hundreds of times a day for months, and are rewarded for actions such as killing an enemy and taking map objectives.

Praise
EleutherAI's work to democratize GPT-3 has won substantial praise from a variety of open source advocates. They won the UNESCO Netexplo Global Innovation Award in 2021, InfoWorld's Best of Open Source Software Award in 2021 and 2022, was nominated for VentureBeat's AI Innovation Award in 2021.

Gary Marcus, a cognitive scientist and noted critic of deep learning companies such as OpenAI and DeepMind, has repeatedly  praised EleutherAI's dedication to open source and transparent research.

Maximilian Gahntz, a senior policy researcher at the Mozilla Foundation, applauded EleutherAI’s efforts to give more researchers the ability to audit and assess AI technology. “If models are open and if data sets are open, that’ll enable much more of the critical research that’s pointed out many of the flaws and harms associated with generative AI and that’s often far too difficult to conduct.”