Next generation sequencing has transformed the study of microbial ecology. Through the availability of cheap efficient amplification and sequencing, taxonomic marker genes such as 16S rRNA are used to provide inventories of bacteria in many different environments. In particular, studies of the gut microbiome have the potential to shed light on important health disorders such as obesity, diabetes, and Crohn’s disease.
We introduce a Bayesian factor analysis for discrete samples of species from many environments. The marginal prior on the distribution of species in each environment is a normalized completely random measure, and the dependence between environments is described through latent continuous factors. The procedure is nonparametric in two ways. The number of species is not necessarily assumed finite, and the dimensionality of the factors is learned from the data. We demonstrate that the analysis yields good estimates of the distributions of species. We also develop a method to visualize credible regions in popular ordination methods applied in microbiology by alignment of posterior samples through conjoint analysis.
Joint work with Boyu Ren, Susan Holmes, Lorenzo Trippa, and Stefano Favaro.