Contemporary reinforcement learning (RL) theory suggests that potential choices can be evaluated either by a model free (MF) strategy of learning their past worth through trial and error, or a model-based (MB) strategy of predicting their likely consequences based on learning how decision states transition to outcomes. Although recent human neuroimaging studies have provided some insight into the brain systems that are activated during MB RL tasks, the neural underpinnings of MB RL remain unknown, at least partly due to uncertainty regarding whether species other than humans are capable of MB RL behaviour. We trained monkeys to perform a two-stage decision task designed to elicit and discriminate the use of both MF and MB RL. Descriptive and trial-by-trial computational analyses revealed that the task structure (of MB importance) and the reward history (of MF and MB importance) significantly influenced choices, reaction times and pupil diameter, with choices made according to a weighted combination of both RL systems. Single neuron correlates of key elements for MF and MB learning were observed across both frontal and striatal regions, but with functional dissociations. Striatal activity was consistent with a role in value updating by encoding reward prediction errors. In contrast, neurons in anterior cingulate cortex (ACC) encoded features of both MF and MB RL, suggesting a possible role in the utilization and/or arbitration between both RL strategies. Finally, neurons in frontal pole exhibited a unique role in counterfactual processing. These results have important implications in understanding the neural mechanisms which support complex learning-related processes in the primate brain.