Sound detection amidst noise presents an important challenge in audition. Many naturally occurring sounds (rain, wind) can be described and predicted statistically, so-called sound textures. Previous research has demonstrated humans’ ability to leverage this statistical predictability for sound recognition, but the neural mechanisms remain elusive. We trained mice to detect vocalizations embedded in sound textures with different statistical predictability, while recording and optogenetically modulating the neural activity in the auditory cortex. Mice showed improved performance and neural representation if they sampled the statistics longer per trial. Textures with more exploitable structure, specifically higher cross-frequency correlations (CFCs) improved performance, background representation and vocalization decoding. Activating parvalbumin-positive (PV) interneurons had an asymmetric effect, improving detection and neural representation of vocalizations for low, and vice versa for high CFCs. Thus, mice can exploit stimulus statistics to improve the sound detection in noise, reflected in performance and neural activity, while relying on PV interneurons.