Talk:Explainable artificial intelligence

Recent edits
This recent edit removed one of the references from this article. Why did you remove this reference? Jarble (talk) 13:10, 13 August 2019 (UTC)

Changes for History Section, Research in Explanation in the 70s, 80s, and 90s in Symbolic AI
The previous discussion of XAI did not address the large amount of work done in explanation in the 70s, 80s, and 90s. Although XAI is commonly used in the context of deep learning, to restrict discussions of XAI to deep learning alone is to presuppose that XAI will only be developed in that context and not in the context of hybrid symbolic / deep learning systems. Minimally to address the historical record, the earlier work needs to be further addressed.

I just wanted to explain a bit more about the changes I added to flesh out the history of explanation in the 70s, 80s, and 90s. The previous coverage only mentioned MYCIN, ignoring explanation research in intelligent tutoring systems, causal reasoning, explanation-based learning, and truth maintenance systems that I have tried to correct.

Indeed, much more could also be said about the ability of current ontological, semantic web, knowledge based, and intelligent tutoring systems to support explanation. I haven't pursued that at this point, or the point about symbolic approaches addressing primarily what Daniel Kahneman calls Type II systems while deep learning approaches better address Type I systems. I know that Yoshua Bengio and Gary Marcus have debated this, while others such as Doug Lenat, Chris Re, and Oren Etzioni, and others I am most likely missing have also made this distinction. The point here is just to give prior work its due, without taking away from all the amazing accomplishments of deep learning.

Yet more could be said about the work of explanation in abduction, such as Jerry Hobbs’ work on Tacitus. More could be said about the role of explanation in explanation-based reasoning and reasoning by analogy, as covered by some of the chapters in Machine Learning, Volume III. Veritas Aeterna (talk) 04:12, 21 January 2020 (UTC)

Restoring Discussion of Work in 70s, 80s, and 90s
Mindpit deleted the discussion I added of the history in symbolic reasoning work that focused on work related to explanation during the 70s-90s. The explanation was that it 'Removed an unnecessary line(and its references) that seems to have been put by one of the authors of the cited paper in order to promote his work.', but no explanation was provided on the talk page, and no mention of which work or reference was in question.

I am not the author or a co-author of ANY work cited in the text I added. If Mindpit has a specific objection to a reference could he / she please let me know and we can add other alternatives to make the same point?

The point of the section is show the nature and volume of the work in symbolic reasoning at that time relevant to explanation, and to provide concrete examples to illustrate what is being discussed. If there is an objection to a specific reference, I can find other references to make the same point.

I'd request that Mindpit only remove the line or reference they object to, or which they desire additional references for, and not the whole section, too. I wanted to send a notice to Mindpit, but could not find a user or talk page for him or her, if he or she wants to reply here or notify me of any comments or disagreements.

Thanks.

Veritas Aeterna (talk) 20:56, 17 February 2020 (UTC)

What is a white box?
The article has this line: "Nevertheless, genetic programming naturally works as a white box.[24][25]"  What is a white box? It is not a common term in the industry, no definition is provided, and most worryingly, the two citations cited [24] and [25] provide no understanding whatsoever of what a white box might be. What is this sentence trying to accomplish in the article?

Thank you. Populationecology (talk) 13:15, 30 April 2020 (UTC)
 * Feel free to remove it right away if you like, otherwise we can leave it a week to see if someone can supply a source. Rolf H Nelson (talk) 04:02, 1 May 2020 (UTC)

Sounds good. I waited 10 days since your comment, and then removed the sentence as it was still unsupported. Anyone who would like can delete this talk section if you need to clean up the talk page, but for now I will leave it here.

Thank you. Populationecology (talk) 15:04, 11 May 2020 (UTC)

Proposing Split (Interpretable is not the same as explainable)
Interpretability and explainability are two related concepts that are often used interchangeably, but they have slightly different meanings in the context of machine learning and artificial intelligence. While both concepts aim to provide understanding and insight into how a machine learning model makes its predictions or decisions, they approach the problem from different perspectives. Interpretability refers to the ability to understand or make sense of the internal workings of a machine learning model. It focuses on understanding the relationships between the input features and the model's output. A model is considered interpretable if its inner workings can be easily understood by a human or if it can be represented in a simple and transparent manner. For example, a linear regression model is highly interpretable because the relationship between the input features and the output is explicitly expressed in the form of coefficients. Explainability, on the other hand, goes beyond interpretability and aims to provide a more comprehensive understanding of the model's behavior by explaining why a particular prediction or decision was made. It focuses on providing human-understandable explanations that can justify or rationalize the model's output. Explainable AI techniques try to answer questions such as "Why did the model make this prediction?" or "What were the key factors that influenced the decision?". The goal is to provide insights into the decision-making process of the model, often through the use of visualization, natural language explanations, or highlighting important features. In summary, interpretability is concerned with understanding the internal mechanics of a model, while explainability is concerned with providing understandable justifications for the model's predictions or decisions. Interpretability focuses on the model itself, while explainability focuses on the output and its reasoning. Both concepts are important in different contexts and have different techniques and tools associated with them Geysirhead (talk) 11:38, 11 June 2023 (UTC)
 * If interpretability is generally a subset of explainability in the literature, I have no problem with the status quo. IMHO We should leave it all in one article unless/until it grows too long and needs to be split. Rolf H Nelson (talk) 19:09, 11 June 2023 (UTC)
 * Interpretability and explainability are related concepts, but they are not necessarily subsets of one another. Geysirhead (talk) 20:09, 12 June 2023 (UTC)
 * I agree with Geysirhead. I've been writing articles about AI, and I've been continuously frustrated by the fact that I can't provide a link to help people understand what interpretability means in ML because this page is all that exists, and it doesn't explain what the field of interpretability is about at all. As I understand it, XAI refers to AIs that were built to be interpretable while interpretability refers to the field. It seems nonsensical to me that XAI, one possible result of the field of interpretability, would have a page while the field itself isn't allowed to have one. If "interpretable AI" is considered too similar for some people, perhaps "interpretability (machine learning)" would be acceptable? Penrose Delta (talk) 16:11, 10 July 2023 (UTC)
 * There is indeed a distinction. If the article is to be split, what about the title "AI interpretability" (https://effectivethesis.org/thesis-topics/human-aligned-ai/mechanistic-interpretability/) ? It would reuse the same pattern as some other article titles (such as AI safety or AI alignment). The title "Mechanistic interpretability" should also be considered, it is more likely to be searched as-is and is more clearly defined, although this term seems mostly used in research (primary sources). By the way, I agree that "interpretability (machine learning)" is probably also a better title than "interpretable AI". Alenoach (talk) 05:15, 21 August 2023 (UTC)
 * I am not sure that interpretability is consistently used to only refer to understanding the inner workings of a machine learning model. AWS Docs comments that "the terms interpretability and explainability are commonly interchangeable"; indeed, LIME is variously referred to as an interpretability or explainability technique. I think it's easiest to explain the nuances on a single page about both interpretable and explainable AI rather than having separate pages. Otherwise, I'm concerned that there will be considerable duplicated content across both pages. For example, is the paper "Language models explain neurons in language models" an interpretability or an explainability paper? I believe it could be considered both. Enervation (talk) 05:35, 24 July 2023 (UTC)
 * Would I be wrong in guessing that explanation leans heavily on social psychology (what is an explanation, anyway?) while interpretability is highly mathematical in nature? My fear is that "explanation" will turn into a sop. What might happen is that we build a large model of what people will accept as an explanation, and then we map the AI model to some convenient point in the space of acceptable explanation. Obviously, this can be done badly or it can be done well. But even when done well, is it useful other than for sop value? But then people go "it's not a sop, as you can see from this hardcore dive into interpretability". And then I go, "so far as anyone could tell, it was a sop until you laid out the interpretable equivalence, and so far as I'm concerned, the interpretable equivalence is wearing the pants here". Maybe it's just me, but I suspect that "explanation" is never going to float my own boat. I might be more convinced by accountable AI, though that also has a problematic social backdrop. &mdash; MaxEnt 02:41, 5 August 2023 (UTC)


 * Geysirhead's argument makes sense to me, but I think these two topics are intertwined. If they are split, both article would have to repeat a lot of the same material, if you want them to make sense to the general reader. (For example, they would both have to explain the urgent need in 2023 for solutions to these problems in medicine, law, finance, policing.) A suggestion: fix the lede. Define interpretable AI in the second paragraph, and clearly distinguish it from "explainable AI". CharlesTGillingham (talk) 08:40, 9 September 2023 (UTC)
 * Note for transparency: I had notified WikiProject Computer Science of this discussion with this edit to garner some more input. Felix QW (talk) 10:52, 26 November 2023 (UTC)

Soft oppose: As someone with only a tangential research-level understanding of the field, I would find it highly confusing if this were split into two articles. Based on Geysirhead's definitions above, I am unconvinced that these are really two different concepts, and not two different aspects of the same concept. The AWS note and Enervation's comment further suggest these are closely related. If some authors use the terms distinctly, wouldn't it be better to have a section such as "explainability versus interpretability" in the article itself? Caleb Stanford (talk) 16:57, 25 November 2023 (UTC)


 * Thank you for the proposal! Geysirhead (talk) 20:36, 25 November 2023 (UTC)

Wiki Education assignment: Research Process and Methodology - SU23 - Sect 200 - Thu
— Assignment last updated by NoemieCY (talk) 10:18, 28 July 2023 (UTC)