Talk:Waluigi effect

Sourcing
The Cleo Nardo post is interesting and very good... but...  it's just Some Person On The Internet? I get that the arxiv paper cited it which does help, but it's also written a year ago, which is quite some time in the world of LLMs. I imagine that OpenAI would disagree with some of the more grandiose claims about how attempting to align is actually counterproductive. Can something more recent but similarly in-depth back up its claims as still relevant there?

(Also, there was an alleged quote from it before, but I found the quote nowhere in the cited source. Did the refs get swapped around and the quote was from somewhere else, perhaps?)  SnowFire (talk) 05:56, 16 February 2024 (UTC)