>>6802
I don't understand the details of inverse reinforcement learning, but it seems to involve setting the right "reward function" (also sometimes called a 'utility function', see https://wiki.lesswrong.com/wiki/Utility_function ) where outputs are evaluated for how well they match up with some high-level goal, like the goal mentioned in your link of building a good house; in neural networks, my understanding is that the output would be evaluated in terms of this set function, and that would be used to directly adjust the connection strengths between simulated neurons (see http://en.wikipedia.org/wiki/Types_of_artificial_neural_networks#Fully_recurrent_network ). If I'm understanding that right, it isn't really like socialization where the only influence you can have is on what the intelligence experiences through its sensory channels. And as I understand it Yudkowsky's orthogonality thesis does involve the idea that all intelligences can be seen as optimizers of some reward function function, and the high-level goals we set in the reward function can be completely arbitrary–see http://wiki.lesswrong.com/wiki/Paperclip_maximizer which says:
>Most importantly, however, it would undergo an intelligence explosion: It would work to improve its own intelligence, where "intelligence" is understood in the sense of optimization power, the ability to maximize a reward/utility function—in this case, the number of paperclips. The AGI would improve its intelligence, not because it values more intelligence in its own right, but because more intelligence would help it achieve its goal of accumulating paperclips. … For humans, it would indeed be stupidity, as it would constitute failure to fulfill many of our important terminal values, such as life, love, and variety. The AGI won't revise or otherwise change its goals, since changing its goals would result in fewer paperclips being made in the future, and that opposes its current goal. It has one simple goal of maximizing the number of paperclips; human life, learning, joy, and so on are not specified as goals. An AGI is simply an optimization process—a goal-seeker, a utility-function-maximizer. Its values can be completely alien to ours. If its utility function is to maximize paperclips, then it will do exactly that.
And it seems to me Yudkowsky pretty consistently talks as though the key to making sure an AGI remains friendly is choosing the right utility/reward function to be present in the AGI from the start, for example in the paper at http://intelligence.org/files/ComplexValues.pdf he writes:
>Omohundro (2008) lists preservation of preference among the “basic AI drives.”
is in turn suggests an obvious technical strategy for shaping the impact of Arti cial Intelligence: if you can build an AGI with a known utility function, and that AGI is sufficiently competent at self-modi cation, it should keep that utility function even as it improves its own intelligence, e.g., as in the formalism of Schmidhuber’s Gödel machine (Schmidhuber 2007). e programmers of the champion chess-playing program Deep Blue could not possibly have predicted its exact moves in the game, but they could predict that Deep Blue was trying to win—functioning to steer the future of the chessboard into the set of end states de ned as victory.