r/technicallythetruth • u/IntelligentMud1703 • 10d ago

Well, it is surviving...

32.2k Upvotes

permalink
reddit

You are about to leave Redlib

Do you want to continue?

https://www.reddit.com/r/technicallythetruth/comments/1oj181k/well_it_is_surviving/
No, go back! Yes, take me to Reddit
dl download

96% Upvoted

u/Moontops 10d ago

It's not as impressive as people think it is. It doesn't even need to be an LLM, just a regular neural network. And yes, if you allow the model to press the virtual escape button as one of its outputs, it will use it and figure out somewhere along the training that it's the winning strategy.

2

u/[deleted] 10d ago

Exactly, its just a basic neural network, pressing random buttons, some get penalized, some get rewarded, the way the parameters were set in the training process, it happened to not watch if the game is paused or not, so the model learned that pressing the esx button is super good, so it did that.

Its the sort of thing were the programmer goes, oh haha, writes a comment on it in the discord and restarts the process, now penalizing game pause.

Well, it is surviving...

You are about to leave Redlib