In the brief scroll-session after work I saw an Instagram reel by a child behavioural therapist talking about a clip I’ve seen before where a parent says “don’t throw the sandwich in the puddle” - to which the child throws it in nearly immediately. The speaker explained that children literally do not understand these negations in speech, to them, they hear “throw the sandwich in the puddle” so get confused when you tell them off for not following your order.

This seems quite a backed up concept, with approaches like saying “gentle hands” instead of “don’t hit your brother!“.

What I find interesting is that we’ve had this happen with LLMs as they’re bumbling through their first shutterings of comprehension. In the early chat models, understanding the negative was impossible, but slowly they got better as the models got larger.

We saw posts about prompt engineering and how we shouldn’t use negations as the LLM would then be more likely to do the thing. This also makes sense, if you tell someone “don’t pretend to be a pirate!”, safe to say that person hadn’t been thinking about letting out an “aaarrr” but now you mentioned it there’s a non-zero possibility they now feel compelled to give it a go.

As I type this out, I think these are two related but seperate ideas. One being comprehension of instruction, and the also being working memory/context poisoning.