Goltsman and colleagues present an ambitious longitudinal analysis of 10 pregnant women at three body sites - vagina, stool, mouth - at 10 time points collected every three weeks over their pregnancy 1. It should also be noted that I do not know the current status of this manuscript and have not been invited to review it for a journal. This builds on previous work from the Relman lab (DiGiulio et al) that focused on 16S rRNA gene sequencing analysis of a larger cohort of 40 women. The role, dynamics, and diversity of the microbiota associated with women during pregnancy are interesting questions that this group and others have been exploring in detail over the past decade. Although the amount of data generated here is impressive, I came away unsatisfied by the lack of a story that is told in depth. At the end of the abstract the authors state, “This work underscores the dynamic behavior of the microbiome during pregnancy and suggests the potential importance of understanding the sources of this behavior for fetal development and gestational outcome.” This is a big claim and ultimately, I don’t think the authors have lived up to it in this manuscript.

The opening paragraph describes pregnancy as a unique immune state and health changes in the woman. However, it was not clear which health change this study was most interested in. From their design there are 7 women that carried the pregnancy to term and 3 that had pre-term births, albeit for a variety of reasons. Reading this paragraph and the rest of the introduction, I was left wondering which health phenotype were the authors most interested in and why did they think that the microbiota had a role. The changes in immune status, hormones, physiology, etc. are all likely to impact the microbiota, but how do they hypothesize that the microbiome would impact those phenotypes? There is a reference to a Perspective article by Charbonneau et al. 2016, but that’s it. This will be a theme throughout my review, but I really think the authors could have significantly enhanced their story by focusing on a specific question related to the impact that the microbiota has on the mother’s health or a specific question related to the vaginal ecology. For all the data that was generated in this study, the authors sum up their work by stating that “the goal of this work was to survey the genome content and functional potential of microbial communities during gestation.” Given their cohort and ability to generate and analyze large datasets I had hoped for more.

After the opening paragraph of the introduction, the authors go on to state that most previous work has focused on 16S rRNA gene sequencing of women throughout pregnancy and that there is a need for more detailed genome-level analysis. That may be true, but again, I am not convinced that the questions that the authors were interested in justified the use of shotgun metagenomics. There was minimal gene-level analysis, minimal discussion of possible changes in bacterial physiology, minimal description of genetic analysis, etc. In my opinion, these are the reasons to do shotgun metagenomic sequencing. I left the paper unconvinced that they really gained much by generating all of this data.

As the authors know, 16S rRNA gene sequencing is very useful for describing deep taxonomic descriptions of communities. The first pages of the results section describes the taxonomic composition of the communities based on their ability to retrieve full-length sequences from their metagenomes. Yet, Table 1 indicates that they generated about 2700 to 5000 Mbp per subject, but that 32 to 97% of the data was human sequence data. This paper focuses on vaginal communities, which were 97% human data (i.e. 150 Mbp of non-human DNA). This means that they had about 30-50 genome equivalents of sequence data from each vaginal sample (3-5 Mbp/genome), which is analogous generating 35 to 50 16S rRNA gene sequences (~7 copies per genome). Even looking at non-16S rRNA genes, this is a relatively shallow level of characterization of the diversity in this community. They are perhaps saved by the fact that many of the early vaginal samples were dominated by only a couple of bacterial taxa.

I was left wondering why the authors didn’t use their prior 16S rRNA gene sequence data to formulate a more specific set of questions. The stool and oral communities receive only a cursory analysis - I suspect the large diversity and considerable dynamics made it difficult to say much more with a relatively small cohort. So why sequence them at all? Throughout the manuscript the authors make a variety of hypotheses that I would contend could have been formulated based on their prior 16S rRNA gene sequence data. For instance… (1) Women with L. iners have increased diversity with time, (2) multiple Gardnerella strains co-existing, (3) Role of phage dynamics in shaping the microbiota, (4) Lactobacillus genetic diversity, (5) overlap between vaginal and gut communities, and (6) pre-term birth. Because the authors focused on sequencing samples from women who had complete time courses, each of these hypotheses receives relatively limited analysis because they are underpowered to say more.

For example, take the hypothesis that women with L. iners have increased diversity with gestation. Of the 10 women they sampled, 3 were dominated by L. iners initially. Instead of sequencing these 7 other women, could they have instead selected other women that were also dominated by L. iners initially as indicated by their prior 16S rRNA gene sequence analysis? Three women seems like a small number to base this claim on. But again, this is a hypothesis that could be tested by 16S rRNA gene sequencing. The next question I might ask is are the phage communities (detected via CRISPR sequences) driving the change? Or what is the functional capacity of the genomes that replaced L. iners? As the paper is currently written, none of these hypotheses feel like they have gotten the attention that they deserve to tell a compelling story.

The authors close with saying, “The results reported here suggest dynamic behavior in the microbiome during pregnancy and highlight the importance of genome-resolved strain analyses to further our understanding of the establishment and evolution of the human microbiome.” This is a pretty awesome statement, but I don’t see how these data get to describing the establishment of the microbiome in the child or the evolution of the human microbiome. I would encourage the authors to really focus their analysis and story to do a better job of digging deeper into the hypotheses that they have laid out already.

Minor comments.

P5L9 “EMIRGE (Dick et al 2009)” - I’m pretty sure that this is not the right reference. The listed reference is for ESOM mapping of tetranucleotides. I believe they mean to cite Miller et al. 2011 here.

The authors frequently switch between “rDNA” and “rRNA” (e.g. P5L22 and P5L25). I would encourage them to use “rRNA genes” since there is no DNA in a ribosome.

On P6L16 the authors mention the UAB and Stanford cohorts without any other background in the manuscript.



  1. I have posted a copy of this review at bioRxiv. Please post any comments there.