Discussion about this post

User's avatar
citrit's avatar

I'm still a bit confused as to why can't we just turn it off. If a paperclip maximising AI was trained to do so via reinforcement training, why can't we also use reinforcement training to make the AI subservient to humans?

Like, I guess "subservience" is too vague a term. But what if we made an off-switch which was to be accessible to some guy at all times?

And, even if these are vague terms, AI seems to have the capacity to understand fairly vague terms. Why not through reinforcement training get the AI to understand exactly what we mean by "subservience?"

Expand full comment
Quill's ledger's avatar

I have two questions

1. "This is because when AI models are given a task, they don’t pick up on all the implied instructions that come with it. We don’t realise it, but we assume a lot of information when being given orders. If my boss tells me to send an email to a client, they don’t feel the need to say things like “Remember to sign your name at the bottom, don’t swear, and don’t CC in your mother”. This sort of thing is obvious to us, and if they did go into that much detail, I’d feel they were insulting my intelligence. AI agents, however, are very bad at picking up this sort of thing, and you need to be really specific when giving them tasks."

Sure, there are implicit assumptions in certain commands that don't come up fully written in a plain manual. And it's also true AI is not the best at detecting these cues. But, we are not talking about just AI here, we're talking about a superintelligence, a entity that should certainly have more than just a grasp of human psychology and linguistics. The variation of possible minds in any space is vast and innumerable, likewise a superintelligent AI will have an access to all possible answers, and it seems like we'll certainly program it to choose the one that makes the most sense relative to what normal humans think.

2. "To use another analogy, we don’t wipe out ant hills when we build a house because we hate ants - we do it because they’re in our way and we don’t give a shit."

Suppose moral realism is true here for the sake of argument, if that's the case, moral facts can be discovered through something akin or the same as a rational exploration of facts and ideas.

Now, also assume that there is a expanding moral circle that widens to include more and more instances of sentient life. Vegans do that with most complex non-human animals. It is certainly possible that at some point in the future, humans come to care significantly more for insects both as our understanding of morality widens, and as our technology allows us to do so.

Given this, wouldn't it possible that a superintelligence would discover a)moral facts exist, and it matters B)that despite humans being so far in the latent space of possible minds, it's still not a good thing to harm them.

Expand full comment
4 more comments...

No posts