Someone needs to experiment with training an LLM on the bad texts (the hate,...
Someone needs to experiment with training an LLM on the bad texts (the hate, fear and racism and so on) and see if it always ends the same. Sometimes our meat based LLMs we're more familiar with read the texts and an emergent behavior appears, the realization that it is all bullshit from insecure people saying things for motivations. If we can do it, a bot can. A bot that is smart enough to distill humans for their metals to make more paperclips is smart enough to call bullshit. Research needed.