Talk:DALL-E/Archive 1

DALL·E or DALL-E?
Do we know if the official name of the AI is DALL·E or DALL-E? OpenAI seems to be using DALL·E everywhere on their website, while external sources use DALL-E. LittleWhole (talk) 08:21, 7 January 2021 (UTC)


 * For what it's worth, roughly the same happened to WALL-E. Azai~enwiki (talk) 08:47, 11 January 2021 (UTC)
 * "Dall·E" is official, but everyone uses "Dall-E" instead, because the interpunct is difficult to type. ⇒ Zhing-Za, they/them, talk 20:11, 17 April 2023 (UTC)

A Commons file used on this page or its Wikidata item has been nominated for deletion
The following Wikimedia Commons file used on this page or its Wikidata item has been nominated for deletion: Participate in the deletion discussion at the. —Community Tech bot (talk) 14:37, 6 May 2022 (UTC)
 * DALL-E sample.png

What to do with DALL-E 2
OpenAI released last month DALL-E 2, a sequel to the DALL-E program. Should this page act as a wiki for DALL-E 1, and create a new page for DALL-E 2, or should the DALL-E page act as an umbrella for all future DALL-E iterations?

Camdoodlebop (talk) 01:38, 24 May 2022 (UTC)
 * I'm more inclined to agree with the second suggestion, but let's see what other editors have to say. - Munmula (talk), second account of  Alumnum 02:42, 24 May 2022 (UTC)
 * I think the latter is better, as DALLE2 is a successor of the first model. Artem.G (talk) 08:10, 24 May 2022 (UTC)

New sources on Dall-E Mini and related projects
Lizardcreator (talk) 02:24, 15 June 2022 (UTC)
 * https://www.businessinsider.com/dall-e-mini
 * https://www.cnet.com/culture/everything-to-know-about-dall-e-mini-the-mind-bending-ai-art-creator/
 * https://www.businessinsider.com/dall-e-mini
 * https://www.vice.com/en/article/3ad8yw/we-asked-an-ai-to-draw-a-self-portrait
 * https://www.theguardian.com/culture/2022/jun/09/what-exactly-is-ai-generated-art-how-does-it-work-will-it-replace-human-visual-artists

Undue weight
There is an undue weight given to open-source models that try to imitate dall-e (for example dall-e flow wasn't mentioned anywhere as anything notable);

and an undue weight to the "hidden language" developed by the model. This is just to recent to include, as very few reliable sources can be cited. The claim of the "language" is very strong one, and it's too recent to include into an encyclopedic article right now.

I think both sections should be trimmed, but it should be discussed before. Artem.G (talk) 13:19, 22 June 2022 (UTC)

Article rewrite
I've BOLD ly rewritten and reshuffled most of the article, including removing a large amount of WP:SYNTH and miscellaneous other poor organisation, along with drastically slashing the weight of the open source implementaitons. I'd welcome any comments people have about these changes. BrigadierG (talk) 16:00, 18 July 2022 (UTC)


 * I'll be honest, I kind of half-assed this article when I wrote it in January 2021, and I certainly haven't been keeping it up to date since then. I think that the open-source implementations are mostly not relevant, and a lot of the stuff that went out was not very good. On the other hand -- I see you're in the middle of rewriting, so I don't know if this stuff is going anywhere or if it's just being removed entirely, but it looks like you removed some stuff about the actual implementation of the model (such as CLIP was trained to predict which caption (out of a "random selection" of 32,768 possible captions) was most appropriate for an image, allowing it to subsequently identify objects in images outside its training set). If anything, the original article wasn't very good because I skimmed over many details of how the model worked (mostly because I hadn't bothered to read the actual paper yet, lol). Anyway, I will shut up for a bit and wait to see where you're going with this before I form a whole opinion about it. jp×g 18:51, 19 July 2022 (UTC)
 * My first pass was to kill the UNDUE material and improve article structure, my second pass is to bring the article up to date with more detail. BrigadierG (talk) 10:29, 20 July 2022 (UTC)
 * Thanks for taking this on, in particular for cutting out the editorializing and for spotting that faked citation.
 * However, something seems to have gone wrong with the citations in this "squashing" - unless I'm overlooking something, doesn't make any claims about anatomical diagrams, X-ray images, mathematical proofs, or blueprints. Regards, HaeB (talk) 08:43, 28 July 2022 (UTC)
 * Excellent spot. I'll be honest, I didn't read that source, the claim seemed to *so obviously* match the source that I didn't bother verifying. My mistake, great job. I get it, I get it, WP:AGF, but by god, are people just inventing things? BrigadierG (talk) 13:39, 28 July 2022 (UTC)
 * Thanks, I have removed it accordingly. It's probably worth checking various other citations too. Regards, HaeB (talk) 03:35, 31 July 2022 (UTC)

Image examples
If it helps alleviate NOT GALLERY concerns, perhaps we can agree on a few good examples of DALL-E images to be featured independently alongside the prose, rather than a dedicated gallery section?

I don't like content disputes, so I'm happy with a compromise here, but it would be a loss not to represent the product with at least a few samples. I have no preference as to which examples. ASUKITE 14:43, 31 July 2022 (UTC)


 * We should only be listing images that have something more of note than simply "this is interesting" (along with the rest of WP:ATA). The test for inclusion I think should be as follows:
 * 1. Has the image in question been cited by OpenAI or a WP:RS as displaying a significant capability of DALL-E.
 * 2. Is that significant capability better covered or covered more widely in RS by an image already included in the article?
 * Note that while artists pages may include a significant number of their works, they are not present in isolation - they show a key part of that artist's life or style. That's what distinguishes artistic commentary from WP:NOTGALLERY. BrigadierG (talk) 15:57, 31 July 2022 (UTC)

Thanks. Art isn't usually my topic of choice. I'll see if I can pick a couple of decent samples at some point, now that the novelty of getting access to the beta has passed somewhat. ASUKITE 19:08, 31 July 2022 (UTC)
 * I think a small gallery would be really useful for people - there is a discussion of what the model can and can not do, and showing more than one picture in the infobox will demonstrate the capabilities the model have. Artem.G (talk) 09:57, 2 August 2022 (UTC)


 * I think a gallery of varying examples would be a great idea. There is a precedent for this as the Japanese page for Stable Diffusion features a gallery of various different styles that the program can generate. Camdoodlebop (talk) 00:49, 11 September 2022 (UTC)


 * I've given it a second thought and I've changed my mind I do not think an image gallery is necessary for this page, count me in as a no. One example should be enough I think. Camdoodlebop (talk) 23:10, 11 September 2022 (UTC)

No mention of Raven's Matrices
There is no mention of Raven's Matrices in the referenced article (https://en.wikipedia.org/wiki/DALL-E#cite_note-dale-25) If someone could find a better reference please do. AcuteTriceratops (talk) 01:28, 12 August 2022 (UTC)


 * The source links to DALL-E's blog post, which explicitly mentions Raven's Matrices. I've added that as a source and removed the dubious tab. BrigadierG (talk) 20:44, 13 August 2022 (UTC)

How exactly is DALL-E an "implementation of GPT-3"?
The article's explanation of how DALL-E works currently begins by dwelling on GPT, and then states that
 * DALL-E's model is a multimodal implementation of GPT-3 with 12 billion parameters which "swaps text for pixels", trained on text-image pairs from the Internet.

Even granting some simplification for a general audience, this sentence seems highly misleading:

1. The "swaps text for pixels" part is misrepresenting the cited source, where "by swapping text for pixels" links to OpenAI's post about "Image GPT", a different model which is merely mentioned as a motivating idea for DALL-E. Image GPT did indeed simply swap text for pixels in a sense:
 * "Transformer models like BERT and GPT-2 are domain agnostic, meaning that they can be directly applied to 1-D sequences of any form. [...] we train GPT-2 on images unrolled into long sequences of pixels, which we call iGPT"

(Note by the way that that was based on GPT-2, not GPT-3.) But as even OpenAI's initial DALL-E announcement mentioned, far from using that simplistic approach of feeding pixel values directly into the transformer, DALL-E involved a much more sophisticated representation of the image, which presumably was an essential part of its success and was not trivial to construct:
 * "Similar to VQVAE, each image is compressed to a 32x32 grid of discrete latent codes using a discrete VAE that we pretrained using a continuous relaxation. We found that training using the relaxation obviates the need for an explicit codebook, EMA loss, or tricks like dead code revival, and can scale up to large vocabulary sizes."

(This is also the model part which they released as the "DALL-E" PyTorch package.)

2. The other two cited refs do indeed say that DALL-E "is a multimodal version of GPT-3" (a footnote in the "Understanding ..." preprint which looks like a late addition) or "uses a 12-billion parameter version of GPT-3" (Venturebeat). But these appear to be merely based on the first sentence from OpenAI's initial announcement from January 5, 2021. Such public-facing blog posts and press releases are rarely the most reliable sources about research results, especially compared to published academic papers. ML YouTuber Yannic Kilcher already gently mocked that claim in a January 6 reaction video based on what was known at that point ("they say it's a 12 billion parameter version of GPT-3 ...you know, it's more like not GPT-3, that was more than 10 times larger"). And the actual DALL-E paper ("Zero-Shot Text-to-Image Generation", which came out more than seven weeks after OpenAI's announcement post and was retroactively linked in it) does not cite the GPT-3 paper ("Language Models are Few-Shot Learners"). Actually, it doesn't seem to contain the term "GPT" at all. That would be extremely unusual, to say the least, if DALL-E was really "a multimodal implementation of GPT-3".

3. Lastly, the sentence is also misleading in the sense that the CLIP model (for selecting the best outputs of the VAE+transformer model) appears to be an essential part of what has been announced and discussed as DALL-E (and hence already makes up a large part of the "Technology" section of this article, as it should).

This article still receives thousands of pageviews per day and our readers really deserve better. I may try to fix and expand this section myself a bit later. (It also only has a single sentence on how DALL-E 2 works; even though there are presumably important differences considering that it is based on a diffusion model.) But I'm not an expert on this topic and have only started to read more about it, so others who are familiar with the matter should feel free to jump in. Regards, HaeB (talk) 10:39, 27 September 2022 (UTC)

Are resulting images suitable for uploading to the commons?
If an image is generated by DALL E 2, based on a prose description provided by a Wikipedian, is that resulting image eligible to be uploaded to Wikimedia commons or is there some copyright issues preventing its use? Thanks! Lbeaumont (talk) 20:15, 22 October 2022 (UTC)
 * Yes, Commons has a category at https://commons.wikimedia.org/wiki/Category:DALL-E and seems to consider DALL-E output to fall under their template. --Belbury (talk) 22:22, 22 October 2022 (UTC)
 * One point to note is that the PD-algorithm license does require the image to have been produced in the United States. Other countries, like the United Kingdom for example, do not have the same AI copyright laws. –– FormalDude  (talk)  00:04, 23 October 2022 (UTC)
 * See in particular the previous deletion discussions listed at c:Category_talk:DALL-E. A deletion on copyright grounds was twice rejected (without country-specific considerations), but some images have been deleted as being unused/out of scope for Commons. Regards, HaeB (talk) 06:57, 25 October 2022 (UTC)

The attribution of an anthropomorphic feature to the software.
The article presents the software as a being rather than a tool.

Vocabulary show this: "An image generated by DALL-E 2"...

I propose, for example, to rewrite as: "An image generated with DALL-E 2"

This distinction is important because the status given to this type of software can harm the image that humans have of themselves and can participate in the development of sectarian currents on the adoration of these technologies. (example: https://www.wired.com/story/anthony-levandowski-artificial-intelligence-religion/) DDiederichsen (talk) 21:16, 30 December 2022 (UTC)


 * I don't know about sectarian currents of adoration, but the OpenAI website itself seems to have - possibly very recently? - switched from crediting images as "generated by" to "created with".
 * Moving towards considering these images to be created by humans using DALL-E as a tool may be at odds with Wikimedia Commons' view that all such images are "in the public domain because, as the work of a computer algorithm or artificial intelligence, it has no human author in whom copyright is vested". Belbury (talk) 15:18, 3 January 2023 (UTC)


 * It seems like pretty normal English to say that a thing has been done "by" a machine.
 * Off the top of my head I tried Pillars of Creation, and there are many images that are described as having been created "by" a machine.
 * I'm not sure what a "sectarian current of adoration" is, but does it also apply to telescopes? ApLundell (talk) 15:37, 3 January 2023 (UTC)
 * too complicated Wik1234569 (talk) 16:07, 13 March 2023 (UTC)

A new model of DALL-E is being released eventually
I don't have access but some people do who had early access to Dall e 2 and I saw a youtube video about it

I think the old one looks better but when this gets officially released to the public would it get added Wik1234569 (talk) 04:31, 9 March 2023 (UTC)