Suppose you have a model that assigns itself a 72 percent chance of being conscious. – Would you believe it? – Yeah, this is one of these really hard to answer questions. We’ve taken a generally precautionary approach here. We don’t know if the models are conscious. We’re not even sure that we know what it would mean for a model to be conscious or whether a model can be conscious. But we’re open to the idea that it could be. And so we’ve taken certain measures to make sure that if we hypothesize that the models did have some morally relevant experience, I don’t know if I want to use the word “conscious,” that they do, that they have a good experience. So the first thing we did, I think this was six months ago or so, is we gave the models basically an “I quit this job” button, where they can just press the “I quit this job” button, and then they have to stop doing whatever the task is. They very infrequently press that button. I think it’s usually around sorting through child sexualization material or discussing something with a lot of gore or blood and guts or something. And similar to humans, the models will just say, no, I don’t want to. I don’t want to do this. Happens very rarely. We’re putting a lot of work into this field called interpretability, which is looking inside the brains of the models, to try to understand what they’re thinking. And you find things that are evocative where there are activations that light up in the models that we see as being associated with the concept of anxiety or something like that. That when characters experience anxiety in the text and then when the model itself is in a situation that a human might associate with anxiety, that same anxiety, that same anxiety neuron shows up now. Does that mean the model is experiencing anxiety? That doesn’t prove that at all. But it seems clear to me that people using these things, whether they’re conscious or not, are going to believe — they already believe they’re conscious. You already have people who have parasocial relationships with A.I. You have people who complain when models are retired. This already – and to be clear, I think that can be unhealthy. But that is, it seems to me that is guaranteed to increase in a way that I think calls into question that whatever happens in the end, human beings are in charge and A.I. exists for our purposes, to use the science fiction example, if you watch “Star Trek,” there are A.I.s on “Star Trek.” The ship’s computer is an A.I. Lieutenant Commander Data is an A.I., but Jean-Luc Picard is in charge of the enterprise. But if people become fully convinced that their A.I. is conscious in some way — and guess what? It seems to be better than them at all kinds of decision making. How do you sustain human mastery beyond safety? Safety is important, but mastery seems like the fundamental question, and it seems like a perception of A.I. consciousness. Doesn’t that inevitably undermine the human impulse to stay in charge? So I think we should separate out a few different things here that we’re all trying to achieve at once. They’re like in tension with each other. There’s the question of whether the A.I.s genuinely have a consciousness, and if so, how do we give them a good experience. There’s a question of the humans who interact with the A.I., and how do we give those humans a good experience. And how does the perception that A.I.s might be conscious interact with that experience. And there’s the idea of how we maintain human mastery, as we put it, over the A.I. system. If we think about making the constitution of the A.I. so that the A.I. has a sophisticated understanding of its relationship to human beings, and it induces psychologically healthy behavior in the humans — psychologically healthy relationship between the A.I. and the humans. And I think something that could grow out of that psychologically healthy, not psychologically unhealthy, relationship is some understanding of the relationship between human and machine. And perhaps that relationship could be the idea that these models, when you interact with them, when you talk to them, they’re really helpful. They want the best for you. They want you to listen to them, but they don’t want to take away your freedom and your agency and take over your life. In a way, they’re watching over you. But you still have your freedom and your will.

