Sally–Anne test



The Sally–Anne test is a psychological test, used in developmental psychology to measure a person's social cognitive ability to attribute false beliefs to others. Based on the earlier ground-breaking study by Wimmer and Perner (1983), the Sally–Anne test was so named by Simon Baron-Cohen, Alan M. Leslie, and Uta Frith (1985) who developed the test further; in 1988, Leslie and Frith repeated the experiment with human actors (rather than dolls) and found similar results.

Test description
To develop an efficacious test, Baron-Cohen et al. modified the puppet play paradigm of Wimmer and Perner (1983), in which puppets represent tangible characters in a story, rather than hypothetical characters of pure storytelling.

In the test process, after introducing the dolls, the child is asked the control question of recalling their names (the Naming Question). A short skit is then enacted; Sally takes a marble and hides it in her basket. She then "leaves" the room and goes for a walk. While she is away, Anne takes the marble out of Sally's basket and puts it in her own box. Sally is then reintroduced and the child is asked the key question, the Belief Question: "Where will Sally look for her marble?"

In the Baron-Cohen, Leslie, and Frith study of theory of mind in autism, 61 children—20 of whom were diagnosed autistic under established criteria, 14 with Down syndrome and 27 of whom were determined as clinically unimpaired—were tested with "Sally" and "Anne".

Outcomes
For a participant to pass this test, they must answer the Belief Question correctly by indicating that Sally believes that the marble is in her own basket. This answer is continuous with Sally's perspective, but not with the participant's own. If the participant cannot take an alternative perspective, they will indicate that Sally has cause to believe, as the participant does, that the marble has moved. Passing the test is thus seen as the manifestation of a participant understanding that Sally has her own beliefs that may not correlate with reality; this is the core requirement of theory of mind.

In the Baron-Cohen et al. (1985) study, 23 of the 27 clinically unimpaired children (85%) and 12 of the 14 children with Down Syndrome (86%) answered the Belief Question correctly. However, only four of the 20 children with Autism (20%) answered correctly. Overall, children under the age of four, along with most autistic children (of older ages), answered the Belief Question with "Anne's box", seemingly unaware that Sally does not know her marble has been moved.

Criticism
While Baron-Cohen et al.'s data have been purported to indicate a lack of theory of mind in autistic children, there are other possible factors affecting them. For instance, autistic individuals may pass the cognitively simpler recall task, but language issues in both autistic children and deaf controls tend to confound results.

Ruffman, Garnham, and Rideout (2001) further investigated links between the Sally–Anne test and autism in terms of eye gaze as a social communicative function. They added a third possible location for the marble: the pocket of the investigator. When autistic children and children with moderate learning disabilities were tested in this format, they found that both groups answered the Belief Question equally well; however, participants with moderate learning disabilities reliably looked at the correct location of the marble, while autistic participants did not, even if the autistic participant answered the question correctly. These results may be an expression of the social deficits relevant to autism.

Tager-Flusberg (2007) states that in spite of the empirical findings with the Sally–Anne task, there is a growing uncertainty among scientists about the importance of the underlying theory-of-mind hypothesis of autism. In all studies that have been done, some children with autism pass false-belief tasks such as Sally–Anne.

In other hominids
Eye tracking of chimpanzees, bonobos, and orangutans suggests that all three anticipate the false beliefs of a subject in a King Kong suit, and pass the Sally–Anne test.

Artificial intelligence
Artificial intelligence and computational cognitive science researchers have long attempted to computationally model human's ability to reason about the (false) beliefs of others in tasks like the Sally-Anne test. Many approaches have been taken to replicate this ability in computers, including neural network approaches, epistemic plan recognition, and Bayesian theory-of-mind. These approaches typically model agents as rationally selecting actions based on their beliefs and desires, which can be used to either predict their future actions (as in the Sally-Anne test), or to infer their current beliefs and desires. In constrained settings, these models are able to reproduce human-like behavior on tasks similar to the Sally-Anne test, provided that the tasks are represented in a machine-readable format.

On March 22, 2023, a research team from Microsoft released a paper showing that the LLM-based AI system GPT-4 could pass an instance of the Sally–Anne test, which the authors interpret as "suggest[ing] that GPT-4 has a very advanced level of theory of mind." However, the generality of this finding has been disputed by several other papers, which indicate that GPT-4's ability to reason about the beliefs of other agents remains limited (59% accuracy on the ToMi benchmark), and is not robust to "adversarial" changes to the Sally-Anne test that humans flexibly handle. While some authors argue that the performance of GPT-4 on Sally-Anne-like tasks can be increased to 100% via improved prompting strategies, this approach appears to improve accuracy to only 73% on the larger ToMi dataset. In related work, researchers have found that LLMs do not exhibit human-like intuitions about the goals that other agents reach for, and that they do not reliably produce graded inferences about the goals of other agents from observed actions. The degree to which LLMs such as GPT-4 can perform social reasoning thus remains an active area of research.