The model-free reinforcement learning (MFRL) framework, where agents learn cached implicit action values without trying to estimate a model of their environment, has been successfully applied to Neuroscience in the last two decades. It can account for most dopamine reward prediction error signals (RPEs) in Pavlovian and instrumental tasks. However, it is still not clear why in the Pavlovian autoshaping paradigm RPEs can been recorded in some individuals but not in others. Moreover, the role of dopamine in other functions not related to learning is still not understood.
Using a neurocomputational model of the basal ganglia, we have previously shown that changing the level of simulated tonic dopamine has an impact on the exploration-exportation trade-off. This predicted that, in addition to possible effects on learning, dopamine manipulations would also have impact on performance which could look like learning effects while only directly altering the trade-off.
We have then combined this computational principle with a model-based / model-free reinforcement learning model applied to Pavlovian autoshaping. In this model, in addition to MFRL, simulated agents also try to estimate a model of the consequences of their behavior on the environment (i.e. model-based, MBRL). This model can explain inter-individual differences by a different relative weight of MFRL and MBRL on behavior: The behavior of sign-trackers is mostly led by MFRL, displaying dopamine RPEs and pushing them towards reward predicting stimuli; In contrast the behavior of goal-trackers is mostly led by MBRL, where dopamine RPEs is absent, and pushing them towards the outcome of their behavioral responses. This model predicts that sign-trackers would be less sensitive to outcome devaluation than goal-trackers, which has recently been confirmed experimentally. Moreover, the model explains why injection of flupenthixol in goal-trackers impairs the exploration-exploitation trade-off and thus blocks the expression of a covert dopamine-independent MBRL learning process.
Finally, we have recently performed injections of different doses of flupenthixol in rats learning different reward probabilities associated to three levers and facing non-signaled block changes (non-stationary multi-armed bandit task). Using model-based analyses of behavioral data, we have found a dose-dependent effect of flupenthixol on the modulation of the exploration parameter but not on the learning rate.
Together these results suggest a variety of mechanisms on which dopamine can impact, which can be progressively better understood through to the tight collaboration between experimentation and computational modeling.