Music has been transmitted orally for countless human generations, changing over time under the influence of biological, cognitive, and cultural factors. How does oral transmission shape the evolution of music, and why do human songs have the structure they do? Here we explored these questions by running large-scale music evolution experiments with singing, in which melodies were orally transmitted from one participant to the next. Our results show that oral transmission plays a profound role in the emergence of musical structures, shaping initially random sounds into more structured systems that increasingly reuse and combine fewer elements (e.g., small pitch sets, small pitch intervals, arch-shaped melodic contours). However, we find that the emergence of these structures depends on a complex interplay between individual factors (e.g., vocal constraints and memory biases) and social influences acting on participants during cultural transmission. Together, these results provide the first quantitative characterization of the rich collection of biases that oral transmission imposes on music evolution, giving us a new understanding of how human song structures emerge via cultural transmission.