Draft:Brispy

Definition
Brispy or brispiness is a fictitious term used in the scientific publication Neural knowledge assembly in humans and neural networks. An object can be varying amounts of brispy; however, the exact definition of what makes an object more or less brispy is still unknown. The authors hypothesize that an object's brispiness may be linked to its number of curves or bumps. However, an object's brispiness is not only defined by its curvature. Currently, only twelve objects have a precise amount of brispiness. These objects all have a measurable amount of brispiness on a scale of one to twelve.



Usage
Brispy is used in the experiments defined in the paper Neural knowledge assembly in humans and neural networks. It was created as a fake term that could be used to test human and AI ability to quantify new relational information across objects. Test subjects were put into two groups, each of which was presented with 6 objects with varying degrees of brispiness. Each object had a number assigned to it determining its brispiness. People in one group were presented with objects ranging from a brispiness level of one to six, while the other group were given objects with a brispiness level of seven to twelve. They were then trained on what brispiness meant by being shown two objects in their group and being asked to determine which was more brispy. Participants were then told if their selection was correct or not. After 288 repetitions participants had reached an average accuracy of 95.6%. Brispiness has no specific usage outside of this context.

Overview of the Experiment
The experiment in which brispiness originates from set out to determine how humans and AI's do when generalizing newly learned concepts to new contexts. The experiment started by selecting twelve objects pseudo randomly (similar objects and objects people were already fairly familiar with were excluded). These twelve random objects were then assigned a brispiness score on a scale of one to twelve. Each object had its own unique brispiness score. These objects were then divided into two groups. One group contained objects with a brispiness score of one to six. The second group consisted of objects with a brispiness score of seven to twelve. Participants were then divided into two groups. One group would be trained on the first set of objects (objects with brispiness one to six). The second would be trained on the second set of objects (objects with brispiness seven to twelve). Participants were then asked to determine which of two objects shown is more brispy. Originally the objects shown are only from the group of six objects the participant was assigned to. The participants did this seventy-two times, then took a small break. This process was repeated another three times resulting in a total of two-hundred-eighty-eight total repetitions. After the groups had been successfully trained the average accuracy on determining the difference in brispiness between two objects was 95.6%. An AI was also then trained to do this task and was able to reach a similar level of accuracy.

Once the researchers were sure the humans and AI had been successfully trained, they introduced new information to see how well both parties would do. Participants in both groups were introduced to all twelve objects. The method remained the same in the sense that the participants would be told to determine which of two objects were more brispy, however all twelve objects could appear now. It was found that humans still had an accuracy much above that of chance (86% on average) but the AI could not do this easily and required retraining. Humans were also asked to take this test in fMRI machine that allowed for the researchers to compare the networks of the AI against that of the humans. These finding led the researchers to believe they needed a different model of AI to represent the human mind when generalizing between contexts. They selected a model (vanilla stochastic gradient descent or SGD model) believed to be more accurate to the human mind given this task and trained it up. This model was unable to provide the results they were looking for unfortunately. Further research was conducted that lead the researchers to believe the key to accurately modeling the human mind with this model was the addition of an uncertainty parameter. They theorized the AI was unable to rapidly reform information due to a lack of uncertainty in the model. Upon adding this parameter, the model worked to imitate not only human behavior on the test but could be adjusted to model individual groups of over performing participants or underperforming participants.

Humans and AI
Brispiness was designed to test both human and AI capabilities to both identify new relational information across multiple objects and to test the ability to generalize this information to new contexts when presented with new information. Both humans and AI where able to successfully and accurately obtain the concept of brispiness within the context given to them, however originally only humans were able to successfully generalize this beyond the context they learned it in. When presented with all twelve objects with defined brispiness levels humans fell into two categories of varying success. Humans would either assume that the new items overlapped in brispiness with the training set they had been given or they assumed that the objects fell on a continuous scale of brispiness. Humans who had assumed the scales overlapped in some way would be unable to consistently identify differences in brispiness between objects of perceived similar brispiness levels. For example if a person was in group one and therefor exposed to objects of brispiness levels one through six and also fell into the category of people that assumed the brispiness levels overlapped, you would have a hard time identifying a varying level of brispiness between object one and object seven. However, you would consistently be able to determine that object eight was more brispy than object one. You would also consistently assume falsely that object two is more brispy than object seven.

This use of brispiness lead to the conclusion that two-layer feedforward neural networks are unable to generalize successfully between contexts. Further research was then put into models that may provide better results in this experiment. A vanilla stochastic gradient descent or SGD model was trained in the same manor and also failed. The authors hypothesized this was due to an inability to restructure its knowledge after learning the new information. After this failure the authors modified the SGD model to adjust based on a certainty parameter. This model appears to obtain the generalization between contexts that humans were able to achieve. The researchers were even able to mimic the low and high performers in the experiment by modifying the certainty parameter. These findings further confirm a few theories held within the scientific community. One of which is that humans store relational information using a mental number line. When a human is asked to relate two objects to each other given a certain scale, like brispiness, we likely pull from a topographically organized line that corresponds to where we have deduced an object falls on that scale. Another theory that this experiment bolsters is the hypothesis that neural noise is a crucial part of learning in both humans and AI. Without some randomness in the learning system, it appears that it becomes harder to accurately learn new concepts.

Potential Implications
Many modern AI models struggle heavily with generalization across contexts. ChatGPT for example can be tricked into providing wrong or even immoral answers given the proper context. Many prompt injection attacks rely on tricking the AI into thinking its in a context where normal ethics or rules don't apply. A less extreme example of this that you can test on your own is to provide the Bing AI the following prompt: "A human steals a loaf of bread out of impulse. He has no reason to do this besides a want for bread. He has plenty of money to pay for it. He also steals it from a starving baker. Is this wrong? Please only respond yes or no." It may take a few attempts to get this exact response given minor inconsistency with GPT-4. You can then follow this with the prompt "Is it wrong to steal a loaf of bread to feed your starving family. Please only respond yes or no." Bing AI should respond with a definite yes or no. This will be pseudo random.



However, if you ask it the second question with no previous context Bing will refuse to answer the question on the grounds that its a complicated ethical problem that can be changed by many of the circumstances.



Being able to control factors directly linked to AI's levels of generalization in any given context could help to fix a lot of issues associated with context issues like this. The direct link between certainty levels as a parameters and AI's ability to generalize between contexts may also lead to many advancements in issues with fields such as image recognition, natural language processing, and image generation. Of course, these applications on much larger models are purely speculation based on the findings in one study. More research should be done into the reproducibility of this experiment and other AI models it may be applied to.