Talk:Large language model

Model list as prose?
I think the table is as good as a table can be, but a prose format would probably be more flexible, letting us compare and contrast the models. There aren't really a lot of reliable sources right now, but if we get into unreliable sources there is a fair amount of comparison to be done as far as performance and licensing restrictions. Mathnerd314159 (talk) 23:08, 12 March 2023 (UTC)
 * I wouldn't want to lose the table. It allows the reader to easily find and compare certain key characteristics (who made it, when was it released, how many parameters does it have) in a way that just wouldn't be possible with prose. The "Notes" column also provides a bit of a compromise, in that we can put arbitrary prose about any model in there. I didn't make much use of that column in my first draft of the table, but I was planning on filling it out more soon (maybe today).
 * That said, I wouldn't necessarily be opposed to also having some prose discussing and contrasting some of the most noteworthy LLMs (especially if those comparisons are supported directly by RS, rather than being WP:SYNTH-y). One way to do that could be to have a top-level "Notable large language models" section that starts with some prose and then contains the current list as a subsection. At some point it may be even appropriate to split off into a stand-alone list article, though I don't think we're there yet. Colin M (talk) 15:10, 13 March 2023 (UTC)
 * Well, the expense of training LLMs is a natural selection criteria, so I don't think we'll have a problem with scope creep. I guess I'll just work on the table some more, add a license column and additional notes. Mathnerd314159 (talk) 15:38, 13 March 2023 (UTC)

Character.ai
Is Character.ai a LLM? I think so, but somehow can't find any real info about the model behind the product. Artem.G (talk) 18:08, 13 March 2023 (UTC)
 * The answer to "What is the technology behind Character.AI?" on their FAQ suggests that it is. Or, to be pedantic, it's a product that uses an LLM. But it doesn't look like there's enough public information available for us to be able to say much about the characteristics of the LLM that backs it. Colin M (talk) 18:53, 13 March 2023 (UTC)
 * Yeah, all we really know is that the developers previously worked on LaMDA. So their model is probably pretty similar in terms of coding, but I couldn't find any information on parameters or corpus or anything like that. Clearly it's their "secret sauce" and they don't want to make any information public. Mathnerd314159 (talk) 18:53, 14 March 2023 (UTC)

There is also Claude AI, that should be released soon. Artem.G (talk) 11:38, 14 March 2023 (UTC)


 * Well, even if it isn't released yet, there is enough detail in Appendix A to add it to the table. Says it's a 52B parameter model. Mathnerd314159 (talk) 18:00, 14 March 2023 (UTC)

GPT-4
It's out, kind of: But there's no information, the paper says "GPT-4 is a Transformer-style model [33 ] pre-trained to predict the next token in a document, using both publicly available data (such as internet data) and data licensed from third-party providers. The model was then fine-tuned using Reinforcement Learning from Human Feedback (RLHF). Given both the competitive landscape and the safety implications of large-scale models like GPT-4, this report contains no further details about the architecture (including model size), hardware, training compute, dataset construction, training method, or similar" I guess we list the information as confidential? It's kind of the slippery slope of the list turning into a WP:DIRECTORY of AI companies rather than a useful comparison. Mathnerd314159 (talk) 20:23, 14 March 2023 (UTC)


 * though not everything is known, it's ridiculous to omit the most hyped and well-known LLM, so I've added it to the table. Artem.G (talk) 15:51, 15 March 2023 (UTC)

Criteria for list inclusion
For the sake of keeping the "List of language models" table to a manageable size and useful to readers, I'd like to suggest we limit it to models which have some coverage in secondary sources (even if it doesn't rise to the level of WP:SIGCOV required for a standalone article). I'm pretty sure all the current entries meet this criterion (though one row I just removed did not).

Also, while it hasn't been a problem up to this point, we should be cautious about including models that are merely fine-tuned versions of models that are already present on the list. If we start getting a lot of those, maybe we can consider splitting them off into a separate subsection/table. Colin M (talk) 14:45, 19 March 2023 (UTC)


 * I support this criterion. If we just include every model that's mentioned in every paper, we'd be breaking WP:NOTDIRECTORY and not really providing something that's very useful. Popo Dameron  ⁠ talk  16:55, 19 March 2023 (UTC)
 * Agree, it make sense to have a stricter inclusion criteria or this list will soon become enourmously large. Artem.G (talk) 19:47, 19 March 2023 (UTC)
 * As I said above I think the expense of training the models is a natural limiting factor. So the list simply can't become enormously large unless some breakthrough in training occurs and people can do it at home. There are a few commercial models (Jurassic) that don't have much coverage that I would like to add, and fairseq seems similar. Including these makes the list more complete and hence more useful, even if they don't have many details. But yeah, there's some editorial discretion in deciding what is derivative vs original, so it also makes sense to avoid a WP:INDISCRIMINATE list. I would say the criteria should mainly be having >1 billion parameters,  but also including notable models with a smaller number of parameters. Mathnerd314159 (talk) 01:31, 20 March 2023 (UTC)
 * Regarding, I'm confused about your position here. I strongly disagree with considering corporate blog posts sufficent for notability, because Wikipedia is not for advertising products and brands. If there are better sources available, the article should show that.
 * Preprints are a separate issue, and I concede restoring Galactica for now. WeyerStudentOfAgrippa (talk) 14:19, 7 February 2024 (UTC)
 * ok, let's see. You removed Glam - here is preprint, YaLM 100B - here it is in the news, Falcon 180B - preprint, Mistral 7B - preprint, Claude 2.1 - in the news, and it's Anthropics latest model, Phi-2 - in the news, and only for Eagle 7B there are no better sources, I've added it just because it's the only model based on different architecture and thus interesting. For everything else you can find multiple reliable sources, you just didn't bother to do it. Artem.G (talk) 16:09, 7 February 2024 (UTC)
 * I don't think that a preprint is in and of itself enough (anyone can publish one, after all), but all of these models are talked about enough in the community to, in my opinion, warrant inclusion. Doesn't make them notable enough for an article, of course, but this list certainly should have lower standards than that. popo dameron  ⁠ talk  16:21, 7 February 2024 (UTC)
 * Agree, I don't say each model is worthy of an article, but all (except maybe for Eagle) are notable enough for this list. Artem.G (talk) 16:38, 7 February 2024 (UTC)

I have a question about this one: Generative pre-trained transformer. Should it even exist, or be included into the list? It was strange to found this article, I'm not sure I saw the link anywhere, as everbody just links to GPT-n articles, not this main overview one. Do anyone have any thoughts? Artem.G (talk) 20:23, 19 March 2023 (UTC)
 * We could consider having an entry for the original iteration ("GPT-1"), though it seems to me it falls just a little shy of the threshold for really being considered an LLM. But if you can find quality sources that describe it as such, I wouldn't have a problem with including it. Colin M (talk) 20:31, 19 March 2023 (UTC)
 * Yeah, I think that the current statement in the article that LLMs generally have parameters in the order of billions probably isn't completely true. I think I would consider T5-Large with its 770 million parameters to be an LLM, though I have seen terms like "medium-sized LMs" floated around in literature too. Anyway, I doubt there are enough reliable sources about GPT-1 to merit an entry. Popo Dameron  ⁠ talk  00:24, 20 March 2023 (UTC)
 * I think that article falls under WP:BROADCONCEPT, so in spirit it's basically a disambiguation page. I would not include it here, only specific models. Mathnerd314159 (talk) 01:20, 20 March 2023 (UTC)

NLP
see NLP: I assume NLP stands for "Natural language processing". Stating that explicitely would do no harm. Jyyb (talk) 08:50, 3 April 2023 (UTC)
 * I've expanded the abbreviation at the first instance in the article where it's used. Does that help? It's tricky because we don't want to confuse the reader with opaque acronyms, but spelling out the full phrase at every use gets kind of verbose. Colin M (talk) 16:49, 4 April 2023 (UTC)
 * Yes, that helps, thanks. —Alalch E. 16:53, 4 April 2023 (UTC)

'Hallucinations' are just hype
I deleted the section about hallucinations on the grounds that LLMs cannot hallucinate and the use of the term is just marketing hype. If anyone is interested, this framing of the issue has been criticized by public figures (https://undark.org/2023/04/06/chatgpt-isnt-hallucinating-its-bullshitting/) AdamChrisR (talk) 01:22, 8 April 2023 (UTC)


 * 'Hallucination' is a technical term. You linked an opinion piece that suggests adopting different term for the same meaning, but at this moment, it is indeed a well established term. Yes, it does come from a concept that is usually attributed to humans, but that doesn't mean it has to perfectly match the human version. For a similar disparity, see Attention (machine learning). As long as these terms are in common use in research literature, however, there is good reason to cover them on Wikipedia. Popo Dameron  ⁠ talk  01:45, 8 April 2023 (UTC)
 * In that case, I think that the 'hallucination' content should be moved into another section where it can be put into context (it's a one-sentence section, so should I should be merged regardless of the confusion this term causes). For now, I'll just merge it with the previous paragraph about the ability of the LLM to often regurgitate correct facts, though it may be appropriate with the section about emergent properties or applications. Based on the academic work cited in the Hallucination article, it sounds like Hallucination is only an issue when the LLM is used for a specific application (in which case I'd just call it an error, but I guess I'm old fashioned). AdamChrisR (talk) 13:01, 8 April 2023 (UTC)
 * "Hallucination" is figurative speech used as a term of art. I don't see how it can be marketing hype. —Alalch E. 21:04, 8 April 2023 (UTC)

Scaling laws section
Scaling laws section looks too technical and too specific for a general article, especially with the table in the Chinchilla law subsection. How is it valuable and why it should be here? Maybe a small paragraph on scaling laws would be better? Artem.G (talk) 10:26, 1 May 2023 (UTC)


 * Chinchilla scaling law specifically is basically the reference used in all the latest LLM training runs such as GPT-4 (suspected), LLaMA, etc. I found no good online reference so I got through the papers and wrote the section myself. As for general interest -- I think it has general interest, if only to understand why the LLM parameters, computing budget, and datasets, are chosen the way they are. It might seem technical for a LLM article but it really just involves some high school algebra.
 * Probably good to split it into its own article (the scaling laws don't just apply to LLM). I plan to do that after I get through some more papers. pony in a strange land (talk) 21:11, 1 May 2023 (UTC)
 * As a quick demonstration of how it can be used for general interest: if you look at the Chinchilla scaling law table, and look at how big the largest text corpus available (~10 trillion tokens) you immediately see that the efficient network size is 500 billion parameters, and 3.43e+25 FLOPs which is about 3600 A100-GPU-years.
 * Quick conclusions:
 * GPT-4 is most likely around 500B parameters, probably less since they have to not just train the model, but also use it. When you want to use a LLM a lot, it saves money to train a smaller model for a larger training compute.
 * OpenAI probably used a substantial fraction of all the GPU it has available (on the order of 10000) for perhaps 4 months.
 * The cost of training it is probably >= 80 million USD, since 1 A100-GPU-hour = 2.5 USD, and 1 A100-GPU-year = 22000 USD.
 * LLM isn't going to get much larger, both due to lack of dataset and lack of computing hardware. Lack of money though, isn't yet a serious concern (Microsoft 2022 revenue is 70 billion USD).
 * Another: the largest LLaMA model has 65B parameters, trained on 1.4 trillion tokens. This might seem a bit random but if you look at the Chinchilla table it is exactly what Chinchilla scaling recommends. As the LLaMA paper states:
 * > Our training approach is similar to the methods described in previous work (Brown et al., 2020; Chowdhery et al., 2022), and is inspired by the Chinchilla scaling laws (Hoffmann et al., 2022). We train large transformers on a large quantity of textual data using a standard optimizer
 * I think all these are actually quite vital info for people trying to build some quick intuition on how large the models can be, what they could do, how much it would cost to train, how much it would cost to run, etc. pony in a strange land (talk) 21:39, 1 May 2023 (UTC)
 * these "quick conclusions" are OR by wikipedia standards. I do think it's a valid topic, and I agree that it should be in its own article, but let's not push any OR. Artem.G (talk) 06:29, 2 May 2023 (UTC)
 * Which is why I didn't put any of those into the article. I am illustrating how a reader can use those scaling laws for making sense of AI numbers. It would be overbearing of me to push these extrapolations as facts. pony in a strange land (talk) 06:40, 2 May 2023 (UTC)

Subset of a Foundation Model?
In the first section, there should be a reference to Foundation Model (https://en.m.wikipedia.org/wiki/Foundation_models). Note the first paragraph of Definition section for this Foundation Model entry. Given the 2021 Stanford study, this provides a more substantive definition for LLM as a subset of FM. Suggest the following...

New paragraph before "LLMs are general purpose models" and add to previous paragraph:

LLM is a subset of foundation model that are trained on large language corpus. Continuing trends toward multi-modal data, like video, for training and output responses will blur the language distinction of LLM.

Upon further research... -- Note this Twitter discussion https://twitter.com/tdietterich/status/1558256704696905728 There is a bit of controversy here. FM is too hype-y & 'grandiose'. What is 'large'? LLM does not capture the non-linguistic aspects. -- Also this follow-on discussion https://twitter.com/ylecun/status/1558395878980861952

-- My opinion is... Important to capture the concept of a pre-trained off-the-shelve base (or basis) upon which to build specific applications. Hackathorn (talk) 16:49, 13 May 2023 (UTC)


 * The term "Foundation Model" is subject to a substantial amount of debate and criticism and is not something I would recommend using by default. Stellaathena (talk) 15:45, 14 July 2023 (UTC)

Neural network - by definition?
Article says a LLM is a "language model consisting of a neural network". It's actually a language model which is relatively large. 146.115.70.94 (talk) 16:43, 26 May 2023 (UTC)
 * It does kind of say that.—Alalch E. 18:19, 27 May 2023 (UTC)

Emergent Abilities
Article says:

''While it is generally the case that performance of large models on various tasks can be extrapolated based on the performance of similar smaller models, sometimes "breaks" in downstream scaling laws occur such that larger models suddenly acquire substantial abilities at a different rate than in smaller models. These are often referred to as "emergent abilities", and have been the subject of substantial study. Researchers note that such abilities often "cannot be predicted simply by extrapolating the performance of smaller models". These abilities are discovered rather than programmed-in or designed, in some cases only after the LLM has been publicly deployed. Hundreds of emergent abilities have been described. Examples include multi-step arithmetic, taking college-level exams, identifying the intended meaning of a word, chain-of-thought prompting, decoding the International Phonetic Alphabet, unscrambling a word’s letters, identifying offensive content in paragraphs of Hinglish (a combination of Hindi and English), and generating a similar English equivalent of Kiswahili proverbs.''

... but fails to mention viewpoints like [1], which contend that the emergent abilities are more a function of selecting metrics that harshly penalize smaller LMs.

I'm not an LLM expert, but I found it incomplete that the "Emergent Abilities" section doesn't include qualifications from what I presume are reputable sources.

[1]: https://hai.stanford.edu/news/ais-ostensible-emergent-abilities-are-mirage (May 8th, 2023) Chege711 (talk) 17:12, 4 June 2023 (UTC)

Copying this page
I haven't been on Wikipedia much recently, so perhaps things have changed. I wanted to copy this article into my personal Google Docs to use in my own study of large language models. However, all efforts to copy failed. Is there some sort of copy protection on this page? Is there some other way to make such a copy besides Ctrl-a & Ctrl-c? Thanks. Natcolley (talk) 00:05, 27 June 2023 (UTC)


 * Article > Tools > Download as PDF will usually work. -- Ancheta Wis   (talk  &#124; contribs) 18:21, 5 July 2023 (UTC)

Source of the term and disambiguiation
Having worked with LLMs since ~2018, I am seeing a lot of people discovering the topic in 2022/2023 and thinking that LLMs are only multi-billion parameters models allowing for prompting, whereas it is pretty much any post-ELMo model (btw a section on the only pre-transformer LLM could be useful as well). If no one finds an issue with it, I will add a paragraph about the term history and some of early usages. Andrei.chiffa (talk) 06:54, 5 July 2023 (UTC)


 * Why would anyone find an issue with it? Go ahead and add the section on 'pre-transformer' LLMs. It would be interesting to know more about them, too. DancingPhilosopher  ( talk ) 11:09, 7 July 2023 (UTC)
 * Looks like someone did, because that part is no longer in the article after the recent round of edits. Andrei.chiffa (talk) 09:35, 28 July 2023 (UTC)
 * Based on commits it looks like it was @DancingPhilosopher who removed mention of ELMo (Pre-transformer LLM) and LLMs definition on the 26th of July. Would you mind elaborating as to why? In its current state the article gives too much importance to the Transformer transition rather then the large dataset-based generative pretraining paradigm. Andrei.chiffa (talk) 09:45, 28 July 2023 (UTC)
 * I agree, the article currently puts undue weight on transformers and incorrectly paints a picture that transformers are the only way LLMs are built today. I've had to update the introduction a couple times now to remove incorrect characterizations to the contrary. I just did it again. StereoFolic (talk) 15:18, 28 July 2023 (UTC)
 * it's strange to say that transformers have an undue weight - _all_ llms except Elmo are transformer-based models. Artem.G (talk) 16:56, 28 July 2023 (UTC)
 * Yeah "undue weight" is probably the wrong phrase. What I mean is that if there is such an overlap maybe the articles should be merged. My understanding is LLMs are a broad category, and transformers are not expected to forever be the state of the art. It's not confirmed in detail at the moment but I believe DeepMind is planning something that incorporates AlphaZero-type approaches into their next flagship LLM. StereoFolic (talk) 17:20, 28 July 2023 (UTC)
 * The thing is that while ELMo is the only model, it is also the one to have introduced the whole concept of "Lots of parameters, lots of data and all the compute you can get your hands on" and demonstrate that it worked. Basically, that LLMs were worth being investigated.
 * Transformer/attention-only removed the dependence on RNNs and allowed a fully parallel training of the model, making the training massively scalable. This is the reason that every single model afterwards used a Transformer-derived architecture rather than going back to RNNs.
 * But the Transformer is not the starting point for LLMs, it's just the thing that allowed them to scale and scale and scale until we ran out of publicly accessible data. Andrei.chiffa (talk) 09:03, 4 August 2023 (UTC)

Improve the introduction of this article?
Hi there, I came across this term "Large Language Model" and looked it up. The first sentence of this article is extremely verbose and honestly I still don't know what LLM is. Is it a software program? A theory? An idea? Is there anyone who knows about this topic who can condense it a bit? Here is the first sentence as it appears right now:

A large language model (LLM) is a deep-learning-based language model, embodied by an artificial neural network using an enormous amount of "parameters" ("neurons" in its layers with up to tens of millions to billions "weights" between them), that are (pre-)trained on many GPUs in relatively short time due to massive parallel processing of vast amounts of unlabeled texts containing up to trillions of tokens (parts of words) provided by corpora such as Wikipedia Corpus and Common Crawl, using self-supervised learning or semi-supervised learning, resulting in a tokenized vocabulary with a probability distribution. D rock naut (talk) 13:48, 24 July 2023 (UTC)


 * Agreed, thank you for the feedback. I've gone and reworked much of the introduction, happy to hear any further notes. StereoFolic (talk) 16:31, 24 July 2023 (UTC)
 * Thanks! D rock naut (talk) 17:20, 26 July 2023 (UTC)

Training cost in the List
how is this source reliable? [127] "Parameter, Compute and Data Trends in Machine Learning". Google Docs. Should it be used? It's a user-generated doc, if it contains sources these sources should be used instead of it. Artem.G (talk) 19:40, 24 July 2023 (UTC)


 * Agreed. I've seen that source pop up elsewhere as well. It seems like a pretty straightforward non-WP:RS StereoFolic (talk) 20:09, 24 July 2023 (UTC)
 * removed. Artem.G (talk) 07:20, 25 July 2023 (UTC)

Elmo
hey, sorry, but this In 2018, processing an entire sentence, before assigning each word in it an embedding, had been proposed by the ELMo model. To calculate such, deep contextualized embeddings for each word, it used a bi-directional LSTM, trained on a specific task. makes no sense for me. Can you rewrite in a simpler language? Artem.G (talk) 19:29, 29 July 2023 (UTC)

Algorithmic biases
I saw that some content specific to algorithmic bias was recently added. In particular, the section "Bias and Limitations" was created with content reused from Algorithmic bias. But algorithmic bias is not really specific to LLMs so it's probably too much coverage for this article. I think it would be good to replace the added content with a succinct summary of algorithmic biases as a subsection of the section "Wider impact", and to add a link to the main article "Algorithmic bias". Alenoach (talk) 18:36, 26 December 2023 (UTC)


 * I made the modification. I also removed the sub-section "Language Bias" because all of the biases presented seem to be language biases. And I replaced some primary references with secondary references for easier verifiability. I'm still a bit uncomfortable with the part "it might associate nurses or secretaries predominantly with women and engineers or CEOs with men", because it's not clear that LLMs should be blamed for having these priors ; I would welcome a less controversial example to illustrate the issue. Alenoach (talk) 04:32, 29 December 2023 (UTC)

Move List to a separate article?
As mentioned in "criteria" above, it would seem to make sense to limit the list to foundational rather than fine-tuned models. But then that would disqualify the landmark GPT 3.5, which is a chat-oriented fine-tuned version of GPT 3.0, from ever having its own dedicated entry. In contrast, the recently-added Neuro-sama strikes me as being more of a use case (albeit perhaps fine-tuned model) than a model itself. But I don't feel comfortable deleting it because it is wikilinked to a dedicated article, establishing notability. Due to the nebulous nature of criteria of inclusion, and due to the list length growing unwieldy, and due to the emerging nature of the topic, I propose that the list be moved to its own dedicated article. Michaelmalak (talk) 10:05, 28 January 2024 (UTC)


 * I'm not against a separate list, but I think that a list of most influential models is still needed here (bert, gpts, llama, claude, gemini, mistral, etc) Artem.G (talk) 10:28, 28 January 2024 (UTC)
 * I agree that the foundational models should be talked about here (maybe just in prose) but that the long list would be better as its own article in list of large language models. Popo Dameron  ⁠ talk  21:07, 29 January 2024 (UTC)
 * I did create a page last month, but deleted it after I realised this existed Mr Vili   talk  04:50, 23 February 2024 (UTC)
 * Support moving list due to article length. WeyerStudentOfAgrippa (talk) 16:53, 1 February 2024 (UTC)
 * Reduce to non-table list of notable links. WeyerStudentOfAgrippa (talk) 17:58, 1 February 2024 (UTC)
 * Oppose, as WP:WHENSPLIT does not indicate splitting.—Alalch E. 17:02, 1 February 2024 (UTC)
 * I don't think that size is a reason to split here, but having a long list of every LLM with a shred of notability doesn't feel relevant or useful to this article, in my opinion. I think that replacing the section with a link to a separate page would be a lot cleaner and keep both articles focused. Popo Dameron  ⁠ talk  17:22, 1 February 2024 (UTC)
 * Nice catch, the article appears to be about 900 words below the 6k readable prose threshold. However, the list in table format feels long and unnecessary here.  Several items appear to be sourced to corporate blog posts or preprints.  If the list is to remain here, it could be reduced to a non-table list of notable links. WeyerStudentOfAgrippa (talk) 17:51, 1 February 2024 (UTC)
 * I had actually been thinking about the list table recently. I would have suggested creating a modified, chatbot-listing version of the table at List of chatbots, to which Comparison of user features of chatbots could also probably be merged. – Gluonz  talk contribs 17:15, 1 February 2024 (UTC)
 * I think we should introduce limits to the list to be limited to only base models, perhaps over a certain parameter size, or merging multiple LLM versions into the same listing.
 * This will get excessively long over time, and the prohibitive cost of training large langauge models should prevent it spiraling out of control Mr Vili   talk  04:49, 23 February 2024 (UTC)

Reduce emphasis on non-transformer LLMs?
The opening paragraph includes the text, "Some recent implementations are based on other architectures, such as recurrent neural network variants and Mamba (a state space model).[2][3][4]". I believe this text should be moved MUCH later in the article, if it is mentioned at all. I don't think the citations included are sufficient to demonstrate the notability of these alternatives to the dominant architecture. Is there agreement on this? --Yoderj (talk) 21:15, 21 February 2024 (UTC)


 * agree, no major LLM is based on alternative architectures, so it's undue to mention ot in the lead. Artem.G (talk) 21:34, 21 February 2024 (UTC)

It is weird that this article attributes intelligence to LLMs
The first sentence of this article claims that LLMs achieve “understanding”. Later it attributes “knowledge” to the model.

Using these terms implies consciousness in the models, which is a very strange thing to do. Maybe even misleading.

In general, the article fails as an encyclopedia article in that it doesn’t have enough basic information about LLMs to enable a nontechnical reader to obtain a basic understanding. It is a jumble of unexplained jargon peppered through passages where the topic sentences don’t match the rest of the paragraphs. And then it mixes in these implications that the machines are thinking.

It could use some help. Lot  49a talk 12:12, 4 March 2024 (UTC)
 * I agree that more basics need explaining, but I don't think the claims of understanding are being made implicitly; they are being stated explicitly with reference to the opinions of scholars. Understanding implies consciousness? Citation needed for that! :) The tests for understanding of language aren't tests of whether something is conscious, so this sounds like your own personal theory of understanding. Two points about adding "citation needed": 1) This needs to be done by adding a template, not just typing "citation needed" into the article; 2) Ideally the lead paragraph of an article should have no citations, because everything stated in that paragraph should be a summary of the full article, where the facts should be cited. You were right to highlight a problem with that sentence because the full article says it's controversial whether LLMs achieve understanding, but the lead sentence said categorically that they do. MartinPoulter (talk) 12:48, 4 March 2024 (UTC)
 * Thanks. I appreciate the feedback and also the editor who took some time to improve the introduction. I made an attempt to organize the introduction a little more but since I don't really understand this topic there's not much I can do to help improve it for fear of breaking it. But it really needs work. Lot   49a talk 23:52, 4 March 2024 (UTC)

Reasoning
The most important aspects of LLMs are its ability to do limited reasoning and answer specific questions in my understanding. People are also excited about its coding abilities. The intro doesn't say that. Should it be added? Currently the intro says a variety of text generation tasks, which is rather bland and uninteresting. I can dig up references if that will help. Per wp:lead the lead should contain a summary of the most important content. Daniel.Cardenas (talk) 16:18, 15 March 2024 (UTC)
 * Thank you for the offer. More fundamentally, I am beginning to feel that "language" within the term "LLM" is a misnomer. Any Spanish dictionary or Russian grammar book contains "language" but an LLM goes beyond language by achieving communication and even, as you point out, reasoning. If I am right (and I am no expert), the summary should mention that LLM is a misnomer (which I think has confused several commentators here). 2A00:23C6:54AD:5701:35A2:4775:8118:3606 (talk) 13:31, 28 April 2024 (UTC)

Section "Tool use": Unfinished sentence?
There seems to be an unfinished sentence (or maybe headline?) here: "This basic strategy can be sophisticated with multiple attempts of generated programs, and other sampling strategies. Cost Savings and Reduced Vendor Dependency" Meltron1 (talk) 16:02, 19 April 2024 (UTC)

Transformer architecture
Recent edits in the lead section suggest that all LLMs use the transformer architecture. However, this seems to contradict the section "Alternative architecture". WeyerStudentOfAgrippa (talk) 22:08, 1 June 2024 (UTC)


 * well, all "largest and most capable", including all models from the list, are based on transformer. Alt architectures exist, though they are mostly experiments. Artem.G (talk) 05:39, 2 June 2024 (UTC)