The temporal difference reinforcement learning model has successfully accounted for many aspects of phasic dopamine activity, but a number of major discrepancies have been discovered. Some of these discrepancies can be traced back to the choice of stimulus representation used by early models. In the real world, stimuli often provide ambiguous information about the underlying state, in which case the optimal representation is a conditional distribution over states given the observed stimuli —- the belief state. I will present several experimental studies and computational analyses of the dopamine system that provide support for this model. These findings demonstrate the importance of representational assumptions for understanding learning algorithms in the brain.