Sensory systems have evolved impressive abilities to process complex natural scenes in a myriad of environments. In audition, the brain’s ability to seamlessly solve the cocktail party problem remains unmatched by machines, despite a long history of intensive research in diverse fields, ranging from neuroscience to machine learning. At a cocktail party, and other noisy scenes, we can broadly monitor the entire acoustic scene to detect important cues (e.g., our names being called, or the fire alarm going off), or selectively listen to a target sound source (e.g., a conversation partner). This flexible dual-mode processing ability of normal hearing listeners stands in sharp contrast to the extreme difficulty faced by hearing impaired listeners, hearing assistive devices, and state-of-the-art speech recognition algorithms in noisy scenes. In this talk, I will first describe neurons at the cortical level in songbirds which display dual-mode responses to spatially distributed natural sounds. I will then present a computational model, which replicates key features of the experimental data and predicts a critical role of inhibitory neurons underlying dual mode responses. Finally, I will present recent data revealing similar phenomena in mouse auditory cortex and discuss our efforts to understand the role of cortical inhibitory neurons using a combination of electrophysiology, optogenetics and computational modelling.