Metagenomic Meta-Analysis Illuminates a Vast Universe of Genes in the Human Microbiome

Student Researcher:
Marc Beaudin

Supervisor / Principle Investigator:
Aleksandar Kostic

Additional Authors:
Braden Tierney
Zhen Yang
Jacob Luber
Marsha Wibowo
Chirag Patrel

MD Class of 2021


Despite immense interest in the role of human microbiota in disease, we still do not have a grasp on the scope of the microbiome’s genetic content, a question crucial for understanding microbial function in the context of the host. Roughly, the human body contains an equivalent number of microbial cells and somatic cells, and the microbiome is currently estimated to encode 100x more genes than the human genome1,2,3, though this has yet to be systematically investigated at large-scale. To address the size of the gene universe in an entire human microbiome niche, we undertook a meta-analysis of every readily accessible shotgun sequenced human oral microbiome metagenomic sample (n=1,473). Surprisingly, half of all genes were only observed once across all samples. As such, we find mean dissimilarity across all samples at the gene level to be 0.95, compared to only .43 at the taxonomic level, revealing staggering genetic heterogeneity. This result challenges the paradigm of human microbiome microbial taxonomy and genetics, as the 24 million genes we identify belong to only 788 unique species across our 1,473 samples. Furthermore, we estimate that the worldwide human oral microbiome alone encodes 220 million unique genes, and that adequately sampling this space, such that each new metagenome contains only 1% undiscovered genes, will take nearly 20,000 samples. These results, therefore, serve as 1) a potential explanation for the large heterogeneity observed in microbiome-derived human phenotypes, 2) inspiration for gene-centric approaches in future microbiome studies, and 3) a quantification of the immense need for even larger-scale metagenomic analyses than what currently exist.