In our hectic world, even a routine task like driving involves numerous objects that constantly move, change, and interact. How can the human mind handle dynamic information in an efficient, flexible, and reliable manner? Two largely independent research traditions offer two very different sets of answers. Studies of visual working memory use controlled tasks to expose critical capacity limits in how we represent and track stimuli that are usually simple and arbitrary. Commonsense physical reasoning research examines people’s intuitive expectations and predictions about realistic scenes, using computational models that are approximate but capacity-unlimited. Inspired by both disciplines, the present work combines EEG, behavioral, and computational methods to reveal bidirectional links between working memory and intuitive physics. The first set of studies shows how high-level intuitive expectations shape online processing beyond explicit physical reasoning, directly guiding working memory’s representation and tracking mechanism over and above visual or spatiotemporal information. The second set of studies reveals severe capacity limits in the ability to predict future trajectories of moving physical objects: mental simulation appears to be a serial process, in line with a single-item focus of attention in working memory. Overall, the findings highlight the need for a unified approach to the study of dynamic information processing.