Haemophilus influenzae is a commensal Gram negative bacterium present in the nasopharynx of most healthy children and adults from where it can occasionally systemically spread to cause meningitis, otitis media and other invasive diseases. H. influenzae isolates can be differentiated on the basis of the presence of a capsule with 5 distinct capsule types known to-date (a, b, c, d, e and f). Such isolates are defined as typeable with those lacking a capsule described as non-typeable (NTHi). Before implementation of the polysaccharide conjugate vaccine, H.influenzae serotype B was responsible for causing invasive disease epidemics across the globe. Today, non-typeable and other non-b serotypes are major causes of disease concomitant with an increase in antimicrobial resistance. Here, the population biology of H.influenzae is presented. This was done using a global collection of published whole genome sequence data (WGS) belonging to 1909 H. influenzae isolates. Using a gene-by-gene approach, WGS data were characterised across the genome and the capsule locus annotated. Our findings demonstrate how molecular epidemiology can inform vaccine development, and how other researchers can contribute to this field.