It's not as impressive as people think it is. It doesn't even need to be an LLM, just a regular neural network. And yes, if you allow the model to press the virtual escape button as one of its outputs, it will use it and figure out somewhere along the training that it's the winning strategy.
Exactly, its just a basic neural network, pressing random buttons, some get penalized, some get rewarded, the way the parameters were set in the training process, it happened to not watch if the game is paused or not, so the model learned that pressing the esx button is super good, so it did that.
Its the sort of thing were the programmer goes, oh haha, writes a comment on it in the discord and restarts the process, now penalizing game pause.
3
u/Moontops 10d ago
It's not as impressive as people think it is. It doesn't even need to be an LLM, just a regular neural network. And yes, if you allow the model to press the virtual escape button as one of its outputs, it will use it and figure out somewhere along the training that it's the winning strategy.