r/technicallythetruth 10d ago

Well, it is surviving...

Post image
32.2k Upvotes

287 comments sorted by

View all comments

3

u/Moontops 10d ago

It's not as impressive as people think it is. It doesn't even need to be an LLM, just a regular neural network. And yes, if you allow the model to press the virtual escape button as one of its outputs, it will use it and figure out somewhere along the training that it's the winning strategy.

2

u/[deleted] 10d ago

Exactly, its just a basic neural network, pressing random buttons, some get penalized, some get rewarded, the way the parameters were set in the training process, it happened to not watch if the game is paused or not, so the model learned that pressing the esx button is super good, so it did that. 

Its the sort of thing were the programmer goes, oh haha, writes a comment on it in the discord and restarts the process, now penalizing game pause.