The study examines the impact of delivery mode (cesarean section vs. vaginal birth) on the infant gut microbiome and subsequent breastfeeding outcomes. It finds that infants born via cesarean section have a different microbial composition compared to those born vaginally, which may influence their health and immune development. Additionally, the study highlights that breastfeeding can help mitigate some of the adverse effects associated with cesarean delivery by promoting a healthier gut microbiome. Ultimately, the findings suggest that the mode of delivery and breastfeeding practices play significant roles in shaping early microbial exposure and health outcomes in infants. This underscores the importance of considering both factors in maternal and infant health strategies.
-
Article - Open access
- Published:
Bifidobacterium deficit in United States infants drives prevalent gut dysbiosis
Communications Biology volume 8, Article number: 867(2025)
Abstract
The composition of the infant gut microbiome is critical to immune development and noncommunicable disease (NCD) trajectory. However, a comprehensive evaluation of the infant gut microbiome in the United States is lacking. The My Baby Biome study, designed to address this knowledge gap, evaluated the gut microbiomes of 412 infants (representative of U.S. demographic diversity) using metagenomics and metabolomics. Regardless of birth mode and/or feeding method, widespread Bifidobacterium deficit was observed, with approximately 25% of U.S. infants lacking detectable Bifidobacterium. Bifidobacterium-dominant microbiomes exhibit distinct features when compared to microbiomes with other dominant microbial compositions including reduced antimicrobial resistance and virulence factor genes, altered carbohydrate utilization pathways, and altered metabolic signatures. In C-section birth infants, Bifidobacterium tended to be replaced in the human milk oligosaccharide utilization niche with potentially pathogenic species. Longitudinal health outcomes from these infants suggest that the disappearance of key Bifidobacterium may contribute to the development of atopy.

Similar content being viewed by others
Introduction
Noncommunicable diseases (NCDs) have markedly increased over the last five to six decades and their worldwide prevalence now eclipses communicable diseases1. The increase in NCDs affects both children and adults, with emerging data demonstrating that some NCDs manifesting in either childhood or adulthood initiate in the “First 1000 Days” of life, which includes development in-utero and the first two postnatal years2 This dramatic development reflects the effects of modernization or industrialization on the environment. The Hygiene Hypothesis3 and more recent refinements4,5,6,7 invoke alterations in the host gut microbiome as an explanation for this trend. Of particular interest in the context of infant health is the disappearance of certain Bifidobacteriumstrains, cornerstone microbes associated with numerous health outcomes8 from the gut microbiome of many individuals in industrialized societies7,9,10.
Birth mode, feeding method, and antibiotic use all impact the composition of the infant gut microbiome11 which has been linked to childhood allergic disease, autoimmune disease, obesity, and abnormal neurodevelopment12,13,14,15,16,17,18,19 as well as the efficacy of childhood immunization20,21. These findings in the United States, however, are based on small, isolated studies of the infant gut microbiome10,22. The extent of this problem and the implications for long-term health outcomes for infants being born today in the United States have not been fully explored.
Herein are initial results from the My Baby Biome study (n = 412), the largest cross-sectional sampling to date across the United States of the infant gut microbiome and metabolome. The data demonstrates that infants one to three months of age have a widespread deficit of certain Bifidobacteriumstrains that heretofore predominated in the infant gut and are critical for healthy development. These infants are at risk for noncommunicable diseases, as demonstrated by the finding of increased risk of an adverse immunological outcome by two years of age. Although the term dysbiosis can be contentious and difficult to define, this correlation between microbiome composition and infant health outcomes suggests that a lack of key Bifidobacterium can be defined as a true dysbiosis of the infant gut.
Results
Taxonomic analysis of the infant gut microbiome
The My Baby Biome study is a seven-year longitudinal study monitoring health outcomes and gut microbiome composition and function in infants in the United States (Fig. 1A). A decentralized clinical trial approach was taken to ensure demographics reflecting the United States population (Supplementary Fig. 1A, B)23. The distribution of birth modes (vaginal and C-section) and feeding methods (strictly breastfed, strictly formula fed, and combination fed (a mixture of formula and breastfeeding) among participants in the study was comparable to data recorded by the Centers for Disease Control and Prevention (CDC) (Supplementary Fig. 1C)24,25. The study obtained fecal samples from participants in 48 of 50 states (Supplementary Fig. 1D).
a Participant journey. b, c Aitchison distance PCoA plots of infant gut microbiomes in the My Baby Biome study (n = 412). Subfigure (b) is shaded to illustrate the relative abundance of infant associated Bifidobacterium (B. infantis, B. bifidum, B. breve, and B. longum), samples high in infant Bifidobacterium are seen in the lower right corner. Subfigure (c) is a PCoA showing organization of samples by birth mode. d Volcano plot for prevalence showing the conditional log odds-ratios (conditioned on birth mode) showing species that are observed significantly more often in vaginally born infants (green) or C-section (red). e Bar plot showing the log2 fold change in species abundance between exclusively breastfed and non-breastfed infants (including both formula-fed and mixed-fed) from ANCOVA analysis, adjusted for DNA collection/isolation method and birth mode. Positive (purple) indicates species enriched in breastfed infants, while negative (blue) values indicate enrichment in non-breastfed infants. f Infant Bifidobacteriumabundances for each sample grouped by feeding mode and birth mode. g Bar plot showing GLM-derived association between infant Bifidobacteria presence and species abundance, adjusted for feeding mode and DNA collection/isolation method. Adjusted significance levels: where * is p < 0.05, ** is p < 0.01, *** is p < 0.001.
Metagenomic and metabolomic analysis was performed on fecal samples collected from infants one to three months of age. This collection time frame is considered critical for infant immune development26,27,28 and eliminates the effects on the gut microbiome of introduction of solid food29,30. We detected 559 distinct species above our classification noise level of 0.5% relative abundance, with the average number of species per infant sample being 12.1 (with standard-deviation of 5.5). These numbers exemplify the simplification of the infant microbiome relative to the adult microbiome31. Based on the metagenomic analysis, Bifidobacterium as a whole are both prevalent and abundant in the dataset (Supplementary Table 1), but a bimodal distribution is observed; 24% of infants (19% of vaginally born and 35% of C-section born) lack observable Bifidobacterium. Bifidobacterium tended to be highly abundant when present, suggesting a lack of exposure in these other infants7,9. Four Bifidobacterium(Bifidobacterium breve, Bifidobacterium bifidum, Bifidobacterium longum subsp. longum, and Bifidobacterium longum subsp. infantis) were found to be the most abundant species when present (Supplementary Table 1). Henceforth, we refer to these Bifidobacterium as infant Bifidobacteriumbased on their abundance in our samples and prior literature7,32,33,34,35. B. longum subsp. infantis (subsequently referred to as B. infantis for brevity) was particularly rare, observed in only 8% of samples (Supplementary Fig. 2A)22. These infant Bifidobacterium drive separation of the samples along the principal coordinates (Fig. 1B).
Consistent with previous observations, feeding method29,30,36 (Supplementary Fig. 2B, Supplementary Table 2) and birth mode19,29,30 (Fig. 1C) played a significant role in the composition of the infant microbiome. We saw no significant trends in composition based on age (Supplementary Fig. 2C), gender (Supplementary Fig. 2D), race, or geographic location (Supplementary Table 2). We compared the prevalence of species based on birth mode, as prevalence reflects microbial transmission, and the abundance of species based on feeding method, as abundance reflects establishment of the microbial niche. We found that all species more prevalent in vaginal birth infants were in the phyla Bacteroidota (Fig. 1D), consistent with past observations37. C-section birth infants were primarily enriched in Firmicutes. Bifidobacterium were not directly linked to birth mode in the dataset, suggesting that acquisition of Bifidobacterium is not limited to vaginal birth and can occur through the environment, as has been suggested38. Known human milk oligosaccharide (HMO) consumers, such as Bifidobacterium infantis, had increased abundance in the breastfed group relative to the formula or mixed feeding groups (Fig. 1E). Potentially pathogenic species, such as Klebsiella pneumoniae39 and Clostridium perfringens40,41,42,43 were also enriched in the breastfed group, suggesting an ability to occupy the HMO consumption niche. Clostridium perfringens was enriched both in C-section and breastfed groups, suggesting the possibility of combined effects for birth and feeding mode.
In evaluating the synergy between birth mode and feeding method (Fig. 1F), breastfeeding was associated with increased Bifidobacterium levels in vaginal birth infants. Surprisingly, we saw the opposite trend in C-section birth infants, where breastfeeding was associated with a decrease in infant Bifidobacterium. This suggests that in C-section birth infants, breastfeeding enables the colonization of other HMO consumers which occupy this niche and likely inhibit the post-birth colonization of Bifidobacterium. To better understand this observation, we determined which bacteria increased as a function of infant Bifidobacterium absence (Fig. 1G). The most statistically significant among these is the aforementioned Clostridium perfringens, a potential pathogen with known HMO utilization capabilities44.
To agnostically evaluate compositional differences throughout the population, samples were grouped using Dirichlet Multinomial Mixture (DMM) model clustering45. This yielded three clusters of infant gut microbiomes: C1 (24%), C2 (37%), and C3 (39%) (Fig. 2A). Similar clustering results were obtained with GUniFrac sample distance measurements46 and hierarchical clustering. DMM clusters were generated solely on microbial composition but were strongly associated with birth mode and feeding method (Fig. 2B). C3 had an abundance of C-section births relative to C1 and C2, whereas C1 had an abundance of breastfed infants relative to C2 and C3. There were 23 species with statistically significant differences between the DMM clusters (Supplementary Fig. 2E). C1 is associated with high abundance of HMO consuming Bifidobacterium (such as B. breve). C2 is associated with a high abundance of B. longum and overall higher abundance of Bacteroidota, and C3 is associated with high abundance of Firmicutes and Proteobacteria (Fig. 2C). The most statistically abundant microbe in Cluster 3 is C. perfringens(Supplementary Fig. 2E). Random Forest permutation feature importance analysis shows that the absence of B. longum and B. breve is the strongest driver of cluster separation, suggesting that the depletion of beneficial Bifidobacterium species, rather than the presence of others, primarily drives microbial shifts in this dataset (Supplementary Table 3). These clusters expand upon but are consistent with past observations22,47,48,49. The dearth of Bifidobacterium in C3 is particularly striking (Supplementary Fig. 2F), as nearly 40% percent of infants in the study are in this group.
a Principal Coordinate Analysis (PCoA) of infant gut microbiomes with the Aitchison distance used to measure the similarity between samples. Clustering was performed with Dirichlet Multinomial Mixtures on species level counts, generating three clusters based on lowest Laplace approximation. b Significant associations were observed between the clusters and both birth mode (chi-squared < 0.01) and feeding mode (chi-squared < 0.05), with C3 having more C-section births, C2 having more vaginal births, and C1 having a preponderance of breastfed infants (C1 n = 99, C2 n = 151, C3 n = 162, vaginal birth n = 273, C-section n = 139, breastfed n = 222, mixed feeding n = 138, formula fed n = 52). c Genus-level relative abundance of bacterial taxa across three distinct clusters (C1, C2, and C3). Each cluster is visualized with a bar plot representing mean genus abundance, where genera are color-coded according to their corresponding phylum. The top species contributor for each phylum within each cluster is labeled above the corresponding genus bar. Additionally, each bar plot is accompanied by a pie chart that summarizes the overall phylum distribution for that cluster.
Functional analysis of the infant gut microbiome
To better understand the functional perturbations in the infant gut microbiome caused by a lack of Bifidobacterium, we evaluated HMO utilization capability, general carbohydrate utilization capability, antimicrobial resistance genes and virulence factors across the data set. HMO utilization is a key function of the infant microbiome, as HMOs are the third most abundant solid component in milk after lipids and lactose50and they play a critical role in immune development51,52 and maturation of the microbiome53. Bifidobacterium species, particularly B. infantis, are the primary commensals equipped with genes for HMO degradation54,55,56,57 but they are not the sole contributors to HMO degradation44,58,59.
We evaluated genes involved in HMO consumption and determined the primary taxonomic contributors for each sample and within each DMM cluster (Fig. 3). Approximately 97% of the samples in C1 had Bifidobacterium (with a predominance of B. breve) as the top HMO taxonomic contributor, while C2 had about 65% of the samples with Bifidobacterium (with a predominance of B. longum) as the top HMO taxonomic contributor. The shift in Bifidobacteriumabundance and species prevalence in C2 relative to C1 was associated with a notable reduction in HMO gene clusters H4 (p value < 1e-33) and H5 (p value < 1e-33) that use a diverse set of HMOs60. In C3, much of the HMO utilization capability came from the class Clostridia (like C. perfringens), demonstrating their ability to fill the HMO utilization niche, as described above. This observation of HMO utilization in the absence of Bifidobacterium aligns with studies showing limited HMO utilization capability in a wide variety of bacteria due to nonspecific carbohydrate utilization44,58,59,61. High levels of urease metabolism were also observed in C3 (p value < 1e-6), discordant with the overall lower levels of HMO utilization in the cluster. Urease genes are critical for B. infantis as they provide a nitrogen utilization niche, breast milk urea, to complement the HMO carbon utilization niche62. However outside of Bifidobacterium, the role of urease genes is complicated and they are often associated with pathogenesis63,64. The high levels of urease metabolism in C3 were associated with Firmicutes and Proteobacteria, suggesting that in the absence of Bifidobacterium, the nitrogen urea metabolism niche may contribute to ecological shifts similarly to the HMO utilization niche.
Relative abundances of various functional genomic analyses are shown, with samples arranged in columns organized primarily by DMM cluster (top row) and functions arranged in rows. Signals along the rows are z-scored, with mean values shown in white, values above the mean shaded in red, and values below the mean shaded in blue. HMO gene analogs for each sample are presented where signals are accumulated by KEGG, gene name, and ortholog clustering for Bifidobacterium infantisblon genes organized into the HMO utilization clusters H1-H5 and Urease. The species contributing the most to the HMO utilization signals for each sample were tracked and the grouping is shown above; C1 HMO genes are most often coming from B. breve, C2 genes are frequently coming from B. longum, and C3 HMO genes primarily originate from species belonging to Clostridia, Bacilli, and Gammaproteobacteria. Virulence factors and Antimicrobial Resistance genes are seen to be more abundant in C3. Sialidases and fucosidases are more abundant in C2 and C1.
We utilized the carbohydrate-active enzymes (CAZy) database65 to evaluate broader carbohydrate utilization patterns. We focused on fucosidases and sialidases to better understand HMO utilization in the clusters, as these enzyme classes have the potential to degrade the common HMO structural motifs fucose and sialic acid66. For fucosidases, C3 is significantly depleted in host-glycan substrate utilization and milk polysaccharide utilization (p value < 1e-03) relative to C1 and C2. For sialidases, both C2 and C3 have significantly lower host-glycan sialidase activity (p value < 1e-4). These utilization patterns further suggest that C1 communities are poised to utilize the breadth of HMOs, whereas in C2 and C3 (representing 76% of infants) HMO utilization is functionally limited and occurs opportunistically.
We evaluated two potential indicators of dysbiosis in adults, antimicrobial resistance (AMR) genes and virulence factor (VF) genes67,68 to understand how they are affected by the presence of Bifidobacteriumin infants. AMR genes can pose a great burden in fighting infectious disease and VF genes play a substantial role in defining microbe-microbe and microbe-host interactions. A significant difference in AMR gene relative abundance was observed as a function of birth mode (Fig. 4A) with vaginal births showing a higher abundance. This result is surprising given the preponderance of pathogens that can be acquired in C-section birth69. Infants delivered via C-section had a slightly, but statistically significant, higher relative abundance of VF genes compared to those with a vaginal birth (Fig. 4C) and exclusive breastfeeding led to higher VF genes abundance relative to other feeding methods (Supplementary Fig. 3A). This was primarily driven by breastfed individuals delivered via C-section (Supplementary Fig. 3B), in line with the changes in species enrichment (i.e. C. perfringens) we observed in breastfed, C-section born infants. AMR gene and VF gene abundance were negatively correlated with the abundance of infant Bifidobacterium(Fig. 4B, D), as has previously been observed70,71. C3 had the highest AMR gene and VF gene abundance (Supplementary Fig. 3C, D), followed by C2 and then finally C1. These observations demonstrate a substantial role, regardless of birth mode or feeding method, for Bifidobacterium in reducing the burden of AMR genes in infants and modulating the ecosystem of the infant microbiome.
a Antimicrobial resistance (AMR) gene abundance for each sample was calculated and infants born vaginally are observed to have a statistically significant increase in AMR gene abundance. b AMR gene abundance is lower in samples with higher infant associated Bifidobacterium (Spearman p value < 1e-10, rho = -0.71). c Total virulence factor gene abundance was calculated for each sample and a significant increase in virulence factor abundance for C-section born infants was observed. d Virulence factor abundance is reduced as infant associated Bifidobacteriumabundance is increased (Spearman p value < 1e-10, rho = −0.43).
Metabolic analysis of the infant gut microbiome
To understand the impact of Bifidobacteriumon metabolism in the infant gut microbiome, we quantified a panel of 79 metabolites using isotopically labeled internal standards in 109 breastfed infant fecal samples. A total of 157 samples across feeding modes were initially evaluated, but feeding mode was found to be a substantial confounder, impacting 29 different metabolites in the dataset (Supplementary Fig. 5A). In the breastfed samples, eight metabolites were significantly different among the clusters (Supplementary Fig. 5B), with C3 exhibiting the largest differences. Notably, C3 has shifted bile acid metabolism, a key driver of maturation of the gut microbiome and a correlative factor in infant cholestasis72,73, and reduced thiamine production, an essential micronutrient74. C3 also skews the short-chain fatty acid profile towards higher butyrate production, a fermentative end-product of many Clostridium species. Aromatic amino acid metabolism is substantially shifted among the DMM clusters, leading to reduced aromatic lactic acids in C3 and C2. Aromatic lactic acids, and indole-3-lactate (ILA) in particular, are associated with Bifidobacterium and recognized for their immune modulating properties26,32,75,76. The reduction in ILA production in C2 (B. longum rich) relative to C1 (B. breve rich), suggests that it is not only the presence of Bifidobacterium but the individual species that drive aromatic amino acid metabolism. pH values for the infant fecal samples are highest in C3 and lowest in C1 (Supplementary Fig. 4C), a reflection of their Bifidobacterium content and the production of acetate and lactate77,78. Once again, C1 and C2 are significantly differentiated, suggesting species specific metabolic differences among Bifidobacterium.
Microbe-metabolite network analysis identify patterns involving infant Bifidobacterium
To understand potential host-microbiome interactions, we generated a microbe-metabolite network to provide insight into patterns of co-occurrence and co-exclusion. Our network was decomposed into smaller sub-structures called modules, which are highly connected groups of features. All four previously mentioned infant Bifidobacterium, were found in the same module (Fig. 5). We observed a three-way positive association among Bifidobacterium infantis, Bifidobacterium longum, and Bifidobacterium breve, while Bifidobacterium bifiduminteracted only with Bifidobacterium infantis. Bifidobacterium breve, Bifidobacterium longum, and Bifidobacterium infantis were all positively associated with indole-3-lactate, 4-hydroxyphenyllactate, and thiamine, critical metabolites for immune and cognitive development26,32,79. Bifidobacterium infantishas the additional effect of modifying short-chain fatty acid and bile acid distribution. Bifidobacterium longum and Bifidobacterium breve are negatively associated with potential pathogens indicating a role in their suppression. This network suggests a complementary role for Bifidobacterium in the infant gut, and that having the proper set of Bifidobacterium in the HMO utilization niche suppresses pathogenic species and positively shifts metabolism in the infant gut.
This sparse inverse covariance network highlights associations in breastfed infants, centering on the infant Bifidobacterium module. Nodes are colored to distinguish Bifidobacterium (blue) and other microbes (purple) from metabolites (gray). Edges represent the inverse covariance, with their colors indicating the types of correlation: red for negative and black for positive. Only edges with an absolute covariance >0, a Pearson adjusted p value < 0.05 and an absolute correlation coefficient >0.2 are shown. This approach reduces the spurious correlations typically found in correlation networks.
2-year health outcomes and associated microbiome features
Health outcome data was collected at two years of age to evaluate the connection of the microbiome to allergies, eczema/dermatitis, and asthma–immune conditions that are part of the atopic march80. From the 412 initial participants, we received 210 follow-up health surveys at 2 years of age. 53.8% of parents reported antibiotic use between birth and 2 years of age (Fig. 6A), and 30.0% reported an adverse health outcome (Fig. 6B) based on a pediatrician’s diagnosis (allergies (12.4%), eczema/dermatitis (21.0%), or asthma (3.3%)). Based on this data, we calculated relative risk as a function of DMM cluster (Fig. 6C), controlling for antibiotic use by age 2. We found that relative to C1, individuals in C2 were 3.2 times as likely to develop an adverse outcome (p value = 0.034) and individuals in C3 were 3.0 times as likely to develop an adverse outcome (p value = 0.036). As a point of comparison, use of antibiotics in the first two years of life lead to 3.3 times increase in likelihood for adverse outcome development (p value = 0.003). Since infant Bifidobacterium drive separation of the samples along the principal coordinates, we calculated relative risk for the development of an adverse outcome as a function of infant Bifidobacterium as well (Supplementary Table 4). We observed a 3.1-fold reduction in relative risk (p value < 0.001) as a function of infant Bifidobacteriumabundance. We were intrigued that the relative risk was higher in C2 despite the high presence of B. longum, an infant Bifidobacterium, so we evaluated relative risk as a function of the individual species B. breve and B. longum, the microbial drivers of C1 and C2 respectively (Supplementary Table 4). We found that B. breve had a statistically significant impact on reducing relative risk, with the presence of B. breveleading to a 4.8-fold reduction in relative risk (p value < 0.001). For B. longum, there was not a statistically significant impact on relative risk, although directionally the presence of B. longum led to a 1.8-fold reduction in relative risk (p value = 0.453). Although validation of these observations must be carried out in other cohorts, this data strongly suggests that the composition of the infant gut microbiome impacts atopic disease development.
2-year follow-up survey responses were received from 210 participant families. aStatistics on antibiotic use are shown, with 53.8% of responding families reporting antibiotic use between birth and 2 years of age. b Responses on health outcomes are shown with the fractions of responders reporting allergies (n = 26), eczema/dermatitis (n = 44), and asthma (n = 7); 63 infants had at least one pediatrician-diagnosed adverse health outcome. c Relative risk (RR) plot showing GLM-derived associations between the infant gut microbiome clusters (C1, C2, C3) and adverse health outcomes by two years of age, adjusted for antibiotic exposure within the first two years. Points represent RR with 95% confidence intervals, and the dashed line indicates the baseline risk (C1 cluster; RR = 1). The impact of antibiotic use is shown for comparison. d Barplot showing gene cluster features associated with adverse outcomes, identified via a logistic regression machine learning model adjusted for antibiotic exposure in the first two years of life. Bars represent the average coefficient effect size from 10 cross-validation models, with colors indicating feature types: virulence factors (red) and phage-associated features (blue).
In addition to taxonomic associations, the connection between function and development of adverse health outcomes was evaluated with an ortholog clustering approach that allowed clustering of genes into multimember families to increase their significance in the dataset (Supplementary Fig. 5). We employed a multi-step feature selection strategy combining L1-regularized logistic regression with Recursive Feature Elimination (RFE) to identify key stable features associated with adverse outcomes while controlling for antibiotic use. This analysis was conducted on a dataset containing approximately 224,000 features. Figure 6D highlights the top seven features identified by RFE, which were associated with a higher likelihood of adverse outcomes (Fig. 6D and Supplementary Data 1). It's important to note that RFE removes features based on their contributions to the model and discards redundant features, which highly correlated variables are often prone to. Consequently, several other gene clusters (GCs) may have been equally important to the model as the features shown here; however, only one was selected as the representative.
Among the top seven GCs, two were phage associated: phage repressor protein and endonuclease. Notably, the phage repressor protein came from a GC consisting of 81 genes, all belonging to the Proteobacteria phylum. This observation is in line with recent associations between phage and infant asthma in the COPSAC cohort81. Three other GCs were virulence factor associated: lipopolysaccharide biosynthesis (LPS), pneumococcal surface protein and Type 7 secreting system ESAT-6. All three are known to play roles in immune evasion or modulation82,83,84. Interestingly, the LPS GC is predominantly composed of genes from Firmicutes (373/422 genes).
Discussion
Although numerous studies have highlighted the infant microbiome as a risk factor for immune-related disorders, large size studies in infants in the United States are lacking22,85. The My Baby Biome study was launched to better characterize and understand the gut microbiome and metabolome of a representative infant population across the United States. The study focused on the gut microbiome in their first 100 postnatal days86, a critical time window for immune development and predisposition to NCDs26,27,28.
The gut microbiomes of roughly 25% of infants lacked detectable Bifidobacterium, key microbes that have been associated with reduced NCD burden in the Old Order Mennonite population in the United States9. B. infantis, which predominates in the non-industrialized world7,9 was missing in 92% of infants. Although it has recently been suggested that postnatal environmental transfer of Bifidobacterium can be a dominant force for colonization in infants38 the paucity of Bifidobacterium observed in this study is divergent from less industrialized populations7. If postnatal horizontal transfer of Bifidobacterium is indeed a common method of acquisition, it appears its occurrence at this critical age is greatly reduced in infants in the United States.
With the disappearance of infant Bifidobacterium, opportunistic bacteria that can metabolize HMOs occupy the HMO utilization niche even though their capacity for HMO utilization is limited57,58. In cluster C2, replacements are often of the Bacteroidaceaefamily, using mucin utilization genes to consume HMOs. These replacements fail to provide the same metabolic benefits as Bifidobacterium, such as ILA production. In hunter-gatherer populations like the Hadza, this early replacement with mucin utilizers is rare, as their traditional Bacteroidota phylum representatives, Prevotella, lack substantial mucin utilization capabilities87. The Western diet and lifestyle has caused a proliferation of microbes capable of opportunistically using HMOs87. In C3, bacteria with the potential for pathogenicity, such as Clostridium perfringens40,41,42,43, can occupy the niche instead. We observe that these potentially pathogenic users of HMOs are particularly problematic for infants delivered via C-section. In these infants, breastfeeding does not support the establishment of Bifidobacterium but instead supports the establishment of these potentially damaging organisms. Rather than helping the gut microbiome recover from a C-section birth, breastfeeding may be inadvertently contributing to Bifidobacterium suppression.
The lack of Bifidobacterium leads to fundamental functional changes in the infant gut. As sizable disruption in HMO utilization occurs, antimicrobial resistance gene abundance and the abundance and diversity of virulence factors increases, indicating a change in the microbial ecosystem. This change is also reflected as a change in the infant gut metabolome. The impact of these changes appears to be substantial, as notable associations between adverse immunological outcomes and microbiome composition have arisen in the data set. Overall, the data demonstrates a pattern of consistent changes in the gut microbiome contributing to a change in health outcomes, which we believe warrants the definition of dysbiosis.
The My Baby Biome study will monitor health trends for seven years, but alarming trends are already evident by two years of age, with 30% of infants suffering from allergies, asthma, and/or eczema/dermatitis. From this data we have identified two community states, clusters C2 and C3, that represent 76% of the cohort and are at higher risk for developing an adverse immunological outcome. Additionally, we have observed that infant Bifidobacterium have a protective effect, reducing relative risk in the population. We have also identified genes associated with a higher likelihood of developing adverse outcomes, including a pair of phage-associated genes and a group of three virulence factor-associated immune-modulating genes, which should be validated in other cohorts. We expect further health impacts will become apparent as the cohort ages. The lack of longitudinal sampling and site-based access to medical records may ultimately limit some insights from the study, and the lack of in-depth analysis of the feeding method (including maternal secretor status and exact formula use) precludes a deep understanding of how the nascent microbiome responds to nutrition. However, we are confident that we have established a robust longitudinal study to explore the impact of the nascent microbiome on health outcomes. In addition to the My Baby Biome study, continued work in this field is necessary to understand how the development of the infant gut microbiota impacts health trajectories.
Antibiotic use, lifestyle choices, and dietary habits have all fundamentally shifted the microbiome in industrialized society4,88. In infants this shift can be deleterious, as the microbiome plays a crucial role in immune development26,28. Even among vaginal birth, breastfed infants, the presence of Bifidobacterium, cornerstone infant microbes, is not guaranteed. It is likely that this problem will only worsen as key species, such as B. infantis, become less prevalent, which is a general trend in industrialized society7,10. Given the alarming rise in NCDs and their link to the infant gut microbiome, the gut microbiota offers an opportunity for early intervention with lifelong health impact.
Methods
Study population
The IRB-approved My Baby Biome study (Western Consulting Group IRB protocol PBI-2022-01; NCT05472688) was conducted as a decentralized study. Individuals were recruited to the study throughout the United States primarily through social media. After a brief questionnaire to determine eligibility, caregivers of infants roughly between the ages of one month and two months signed an informed consent form and were subsequently sent a fecal collection kit. The study cohort was selected such that it had racial and ethnic diversity comparable to the United States as well as representative levels of birth and feeding modes. Caregivers were instructed to return the fecal collection kit before the child turned 3 months old and completed an in-depth questionnaire at the time of stool collection. Additional questionnaires were collected at six months, one year, and two years and will continue to be collected on a yearly basis up until the 7th birthday.
Stool collection
For the initial collection, fecal samples were collected at home with stool sampling kits that contained Covidien Precision Stool Collector cups and spoons, ice packs to keep the sample cool, and a DNA/RNA Shield Fecal Collection Tube (Zymo Research). Initial samples were scooped immediately from the infant’s diaper and were express shipped overnight to the lab. Scooped samples were both raw fecal samples and DNA/RNA Shield Fecal Collection Tube samples. DNA/RNA Shield Fecal Collection Tubes were stored at −20 °C upon arrival. Raw fecal samples were processed upon arrival in an anaerobic chamber if they were at temperatures less than 17 °C. Prior to processing, color, weight, and consistency of the stool was noted. Phosphate-buffered saline (PBS) aliquots were generated for analysis. A separate aliquot was taken to measure the pH of the sample. Aliquots were labeled and frozen at −80 °C for long-term storage. PBS aliquots were used for whole-genome sequencing, or if the sample was over the appropriate temperature, the DNA/RNA tube was processed instead. Statistical analysis revealed that the method of DNA collection and isolation (raw fecal versus DNA/RNA tubes) had a significant impact on microbial composition (PERMANOVA, p = 0.02). Consequently, we adjusted for this confounding variable in all subsequent statistical models. Species that differed significantly based on isolation method can be found in Supplementary Table 5.
Sequencing
Total genomic DNA was extracted from a fecal PBS aliquot using the MagAttract PowerMicrobiome DNA/RNA kit (Qiagen). Genomic DNA was then prepared for Whole Genome Sequencing analysis using the KAPA HyperPlus Library Prep™ kit (Roche). Sequencing analysis was conducted on the Illumina platform using paired-end 150 bp reads. For samples obtained from DNA/RNA Shield Fecal Collection Tubes, the DNA was extracted and purified using the Zymobiomics DNA Miniprep Kit. Extracted DNA was stored at −20 °C prior to use in whole Genome Sequencing analysis. Sequencing data was processed to remove low quality reads and adapter contamination using Trim-galore89.
Microbial classification
To identify the relative abundances of species in the infant fecal samples, a custom Kraken2/Bracken90,91 index was built specific to the gut microbiome. First, 1085 gut and oral genera were identified through Unified Human Gastrointestinal Genome (UHGG)92and the MGnify human-oral database (http://ftp.ebi.ac.uk/pub/databases/metagenomics/mgnify_genomes/human-oral/v1.0/).
Subsequently, we analyzed our in-house large cohort of adult and infant gut microbiomes to identify additional genera not present in the UHGG and MGnify microbiome databases. This analysis identified 14 additional genera. Using GTDB taxonomy r207, we identified 17,752 unique species belonging to these genera. GTDB also provided assembly statistics for each genome corresponding to these unique species.
We downloaded 132,128 bacterial and archaeal genomes from NCBI, corresponding to our predefined unique oral and gut microbiome species, and selected those that met our quality criteria of low contamination and high genome completeness, corresponding to our predefined unique oral and gut microbiome species. These genomes underwent Mash-based clustering to dereplicate the assemblies93. We identified an optimal Mash threshold internally to retain strain-level information while removing nearly identical duplicate genomes. (Metagenomics-Index Correction software: https://github.com/rrwick/Metagenomics-Index-Correction).
Relative abundances were computed using the Bracken method90. Mock communities were simulated to validate this classification method, demonstrating increased accuracy at the subspecies level compared to the standard GTDB classification. The classification noise for this custom classifier was measured to be 0.5%. Within each sample, only taxa with a relative abundance of ≥0.5% were considered detected, which reduces the false positive rate for species detection but also reduces true positive detection for low abundance species.
Statistics and reproducibility
When using parametric approaches to data analysis, the centered log-ratio (CLR) transformation was applied to species relative abundance after imputation of zero values using multiplicative replacement. All statistical analyses were performed in Python using either the SciPy v1.10.1 (https://docs.scipy.org/doc/scipy/index.html) or StatsModels (https://www.statsmodels.org/dev/index.html) or StatsModels v0.14.2 (https://www.statsmodels.org/stable/index.html) package. All p-values were adjusted for multiple comparisons using the Benjamin-Hochberg procedure (FDR).
A chi-squared test was performed to analyze associations between two categorical variables (Fig. 1D). A generalized linear model (GLM) was used to investigate associations between two taxonomic variables while controlling for covariates, such as those described in Fig. 1G.
Analysis of covariance (ANCOVA) was employed for Fig. 1E, as well as for the human milk oligosaccharides (HMO) and carbohydrate-active enzymes (CAZY) analyses. This allowed us to control for covariates, such as the method of sample processing (PBS versus DNA collection and isolation). Additionally, ANCOVA was used to examine taxonomic differences between samples processed with PBS versus DNA/RNA isolation, identifying 18 features that were statistically different. However, none of these were among the key microbial taxa highlighted in this paper or present in particularly large numbers (Supplementary Table 5).
For non-parametric approaches to data analysis, we used feature relative abundances. The Mann-Whitney U test was applied to assess associations between two groups (Figs. 1F, 4A, C). Spearman correlation analysis was used to examine associations between two continuous variables (Fig. 4B, D).
For Fig. 6C, a GLM model was used to investigate the association between pediatrician-validated adverse outcomes by two years of age and two independent variables: initial DMM cluster assignment and antibiotic exposure within the first two years. Antibiotic exposure was included as a covariate to control for its potential confounding effect. Relative risk (RR) was calculated from the GLM coefficients by first extracting the coefficients and determining the baseline probability using the model intercept. Odds ratios (OR) were obtained by exponentiating the coefficients and subsequently converted to RR using the formula:
The GLM analysis revealed an association between DMM cluster membership and pediatrician-validated adverse outcomes. Because B. longum and B. breve were among the primary taxa driving the composition of these clusters, we sought to disentangle whether the observed associations were attributable to these specific taxa or to broader emergent properties of the microbial communities captured by the clusters. To address this, we employed Generalized Estimating Equations (GEEs), which more appropriately account for the correlated structure of microbiome data and allow for testing of individual taxa-level associations. Using this approach, we specifically assessed whether (1) infant-associated Bifidobacterium, and (2) B. longum and B. breve, were independently associated with adverse outcomes. For Fig. 6D, we used a feature selection pipeline for logistic regression analysis to identify key gene clusters associated with adverse outcomes while controlling for antibiotic use in early childhood as well as DNA collection and isolation. The pipeline follows three stages: (1) L1 Regularization (Lasso) with 10 repeated cross-validation to identify consistently important features; (2) Recursive Feature Elimination (RFE) on the top 50 consistently important features to further refine the selected features; and (3) Final Model Building using the top features identified in earlier stages.
Functional analysis
Functional capacity of the infant gut was identified using gene annotations provided by the UHGG database92. We mapped paired-end reads to a catalog of 11,019,595 microbial genes, quantifying gene level relative abundance using a combination of exact k-mer matching and an Expectation Maximization (EM) algorithm to handle multi-mapped reads.
The EM algorithm probabilistically assigns each read to its most likely gene of origin based on initial mapping probabilities. This involves iteratively updating these probabilities to maximize the likelihood of the observed read distribution, taking into account the current estimates of gene abundance levels. Through this process, the EM algorithm refines the assignment of multi-mapped reads, improving the accuracy of gene-level abundance estimates. Within each sample, only genes with an expected read count of ≥3 were considered detected.
After assigning reads to genes, we aggregated the gene-level abundances by functional annotation of choice such as human milk oligosaccharide (HMO) genes, carbohydrate-active enzymes (CAZymes), antimicrobial resistance genes, and virulence factors.
HMO Identification and quantification
We focused on six well-defined human milk oligosaccharide (HMO) gene clusters identified in the genome of Bifidobacterium longum subs. infantis (H1, H2, H3, H4, H5, and Urease). To account for genomics variation in HMO-related genes among different bacteria beyond Bifidobacterium, we focused on the KEGG Orthology (KO) group and protein product name for each of the 59 genes found in these six gene clusters, as provided by our functional classifier.
Due to HMO enzymatic promiscuity and the fact that some individual genes within the HMO clusters can share the same KO ID, we also included gene number thresholds for each gene cluster. This allowed us to determine whether a set of HMO gene clusters and their genes were present. If a gene cluster met our predefined thresholds (H1: 8/20 genes present; H2: 2/4 genes present; H3:¼ gene present; H4: 7/12 genes present; Urease: 8/12 genes present), we considered the gene cluster to be present. We then counted the individual KO gene counts and divided by the total number of identified classified genes in the sample to obtain relative abundance.
The gene thresholds were determined by running a subset of baby biome samples (n = 80) through a DIAMOND HMO pipeline94. The HMO database was built using proteins from the six well-defined HMO gene clusters identified in the genome of Bifidobacterium longum subs. infantis. We examined histograms representing the distribution of gene presence within each gene cluster and used internal domain knowledge to confidently select our predefined thresholds.
To identify the main taxonomic contributor of HMO genes for a given sample, we traced each KO and/or protein product name back to its gene of origin. This enabled us to link the gene to its corresponding taxonomic genome. We then determined which taxa contributed the most HMO genes and identified these are our top taxonomic contributors.
CAZy enzyme identification
To determine the relative abundance of sialidase and fucosidase enzymes in our metagenomes, we obtained the CAZy data95specifically using the file available at (https://bcb.unl.edu/dbCAN2/download/Databases/dbCAN-sub.substrate.mapping.xls). This file enables us to link the Enzyme Commission number (EC number) to our functional database. Subsequently, we calculated the relative abundance of all identified CAZy genes and selectively filtered them by the enzyme column to isolate relevant enzymes (sialidase and fucosidase).
Antibiotic resistance gene identification
All microbial genes in our functional classifier were screened for antibiotic resistance using the DIAMOND program to perform a BLASTX-type search against the NCBI antimicrobial resistance database. The DIAMOND outputs were parsed and filtered to collect top hits for each gene. Considering positive hits with (i) E-values ≤ 1e-50 and (ii) bitscores >500. We used this information to flag genes that conferred antimicrobial resistance in our functional database.
Virulence Factor Gene Identification
All microbial genes in our functional classifier were screened for virulence factors using the DIAMOND program to perform a BLASTX search against the virulence factor full database (VFDB)96. The DIAMOND outputs were parsed and filtered to collect the top hits for each gene, considering positive hits with (i) amino acid percent identity >95% and (ii) bitscores >500. We used this information to flag genes that conferred antimicrobial resistance in our functional database.
Metabolomics
Targeted metabolomics was performed at Precion Inc. (https://www.precion.com/) using a microbiome panel based on isotopically labeled internal standards. 157 samples were submitted for metabolomics analysis, 109 of which were from breastfed infants. Samples were provided as PBS aliquots and then dried to determine a mass for normalization. Samples were then homogenized and proteins precipitated. Extracted metabolites were evaluated using a Sciex Exion LC/Sciex 5500+ Triple Quadrupole Mass Spectrometer LC–MS/MS system and four different methods. Aliphatic organic acids, aromatic organic acids, and other negatively charged analytes were evaluated in negative mode using C18 reversed phase chromatography. Amino acids, amines, and other positively charged analytes were evaluated in positive mode using C18 reversed phase chromatography. Short chain fatty acids and lactic acid were derivatized with a substituted hydrazine and analyzed in negative mode using C18 reversed phase chromatography. Phenolic and indole metabolites were derivatized with a substituted sulfonyl chloride and analyzed in polarity switching mode using C18 reversed phase chromatography. Using these four methods, a panel was evaluated that includes: 2-methylbutyrate, 3-hydroxybenzoate, 3-hydroxyhippurate, 3-hydroxyphenylpropionate, 3-methylindole, 4-ethylphenol, 4-ethylphenylsulfate, 4-hydroxyphenylacetate, 4-hydroxyphenylacrylate, 4-hydroxyphenyllactate, 4-hydroxyphenylpropionate, acetate, agmatine, arginine, benzoate, betaine, butyrate, cadaverine, carnitine, chenodeoxycholate, cholate, choline, cinnamoylglycine, citulline, deoxycholate, enterodiol, enterolactone, glycochenodeoxycholate, glycocholate, hexanoate, Hippurate, imidazole propionate, indole, indole-3-acetamide, indole-3-lactate, indole-3-propionate, indoleacetate, indoleacetylglycine, indoleacrylate, indoleacrylglycine, indoxyl sulfate, inosine, isobutyrate, isoleucine, isovalerate, kynurenate, kynurenine, lactate, leucine, lithocholate, lysine, N-acetylserotonin, ornithine, p-cresol, p-cresol glucuronide, p-cresol sulfate, phenol, phenol glucuronide, phenol sulfate, phenylacetate, phenylacetylglutamine, phenylacetylglycine, phenylalanine, phenyllactate, phenylpropionate, phenylpropionylglycine, phenylpyruvate, propionate, putrescine, serotonin, thiamine, trimethylamine, tryptamine, tryptophan, tyramine, tyrosine, ursodeoxycholate, valerate, valine.
Microbe-metabolite network construction
Microbe-metabolite co-occurrence networks were constructed using taxonomic and metabolomics features from 109 breastfed infants. Initially, taxonomic and metabolomics data were treated as independent data frames. The counts were transformed to relative abundances. Following this, center log ratio (CLR) transformation97 was applied to the data after imputing zero values using multiplicative replacement98 to address issues of compositionality. Finally, the two data frames were merged for subsequent analysis. Correlations and associated p-values were obtained using the Pearson correlation method and p-values were adjusted for multiple comparisons using the Benjamin-Hochberg procedure (FDR)99.
A network consists of nodes and edges, where nodes represent individual features and edges represent the connections between those features. This network contained 211 features and 724 connections. The undirected graphical network was generated by estimating a sparse inverse covariance (precision) matrix with L1 regularization using GraphicalLassoCV (https://scikit-learn.org/stable/modules/generated/sklearn.covariance.GraphicalLassoCV.html).
By examining the precision matrix, we can discern genuine correlations from spurious ones, as it highlights conditional dependencies among the variables. GraphicalLassoCV uses cross-validation to select the optimal regularization parameter, ensuring the model is neither overfitted nor underfitted and generalizes well to unseen data. The L1 regularization introduces sparsity, forcing many estimated connections to be zero, resulting in a simpler and more interpretable network. This is especially important for high-dimensional data, as it reduces the risk of overfitting and enhances network interpretability. NetworkX was used to visualize the graphs from the resulting output (https://networkx.org/documentation/stable/index.html).
Protein gene clustering using Markov Clustering Algorithm
Knowledge-based approaches (i.e., KEGG, COG, Gene Ontology) leave a large portion of predicted genes uncharacterized and naïve approaches provide an overwhelming amount of data that is difficult to parse, so we developed an ortholog clustering pipeline that allows gene families to be grouped together to increase their significance in a data set. This allows uncharacterized genes to be analyzed while still providing a manageable framework.
We converted 11,019,595 microbial genes from our functional database into an amino acid FASTA (FNA) file. From this file, we generated another FNA file by removing gene duplicates. This deduplicated FNA file was then used to build a DIAMOND database. We employed an all-versus-all DIAMOND approach, where the full FNA file was queried against the DIAMOND database, similar to the method previously described by Harlow et al.100. We focused on three DIAMOND output columns: qseqid, sseqid, and evalue. The data were parsed and filtered further, then passed through a Markov Clustering Algorithm (MCL) using the -abc-neg-log10 transformation101.
Ther MCL works with the input similarity file by transforming it into a matrix representing a graph. It then uses a series of matrix operations (expansion and inflation) to discover clusters within each graph. Each protein or gene sequence is treated as a node in a network, with edges representing sequence similarities based on the -log10(E-value). This transformation results in a symmetric matrix where the edge weights signify similarity: higher weights indicate more similar proteins, while lower weights indicate less similar ones.
Clusters are characterized by many heavily weighted edges between members, forming tight-knit groups with shorter paths (edge lengths) within a cluster. Conversely, proteins with no similar counterparts form single-protein and multi-protein clusters with long paths to other nodes in the network. The MCL algorithm helps us construct these networks and probabilistically identify protein gene clusters, resulting in a final matrix that can be interpreted as protein family clusters. The inflation value parameter of the MCL controls the granularity or “tightness” of these clusters.
This protein clustering approach has been shown to be effective at identifying protein families, including proteins with multi-domain structures and promiscuous domains (domains that are present in many families), with both internal and external validation102.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Data availability
The authors declare that the main data supporting the findings of this study are available within the article and its Supplementary Information files (Supplementary Data 2–6). Due to data sensitivity, additional data in the form of anonymized fastq files will be provided upon request via a link to a private repository and can be used for research purposes only. Data is available from the corresponding author (Stephanie Culler, sculler@persephonebiome.com).
References
-
Wang, H. et al. Global age-sex-specific fertility, mortality, healthy life expectancy (HALE), and population estimates in 204 countries and territories, 1950–2019: a comprehensive demographic analysis for the Global Burden of Disease Study 2019. Lancet 396, 1160–1203 (2020).
-
Hanson, M. A. & Gluckman, P. D. Later health and disease: physiology or pathophysiology? Physiol. Rev. 94, 1027–1076 (2014).
-
Strachan, D. P. Hay fever, hygiene, and household size. Biomed. J. 299, 1259–1260 (1989).
-
Sonnenburg, E. D. & Sonnenburg, J. L. The ancestral and industrialized gut microbiota and implications for human health. Nat. Rev. Microbiol. 17, 383–390 (2019).
-
Blaser, M. J. & Falkow, S. What are the consequences of the disappearing human microbiota?. Nat. Rev. Microbiol. 7, 887–894 (2009).
-
Rook, G. & Rosa Brunet, L. Old friends for breakfast. Clin. Exp. Allergy 35, 841–842 (2005).
-
Olm, M. R. et al. Robust variation in infant gut microbiome assembly across a spectrum of lifestyles. Science 376, 1220–1223 (2022).
-
Wong, C. B., Huang, H., Ning, Y. & Xiao, J. Probiotics in the new era of Human Milk Oligosaccharides (HMOs): HMO utilization and beneficial effects of Bifidobacterium longum subsp. infantis M-63 on infant health. Microorganisms 12, 1014 (2024).
-
Seppo, A. E. et al. Infant gut microbiome is enriched with Bifidobacterium longum ssp. infantis in old order mennonites with traditional farming lifestyle. Allergy 76, 3489–3503 (2021).
-
Taft, D. H. et al. Bifidobacterium species colonization in infancy: a global cross-sectional comparison by population history of breastfeeding. Nutrients 14, 1423 (2022).
-
Stinson, L. F. Establishment of the early-life microbiome: a DOHaD perspective. J. Dev. Orig. Health Dis. 11, 201–210 (2020).
-
Carlson, A. L. et al. Infant gut microbiome associated with cognitive development. Biol. Psychiatry 83, 148–159 (2018).
-
Koleva, P. T., Bridgman, S. L. & Kozyrskyj, A. L. The infant gut microbiome: Evidence for obesity risk and dietary intervention. Nutrients 7, 2237–2260 (2015).
-
Stanislawski, M. A. et al. Gut microbiota in the first 2 years of life and the association with body mass index at age 12 in a Norwegian birth cohort. mBio 9, e01751–18 (2018).
-
Depner, M. et al. Maturation of the gut microbiome during the first year of life contributes to the protective farm effect on childhood asthma. Nat. Med. 26, 1766–1775 (2020).
-
Vatanen, T. et al. The human gut microbiome in early-onset type 1 diabetes from the TEDDY study. Nature 562, 589–594 (2018).
-
Hoskinson, C. et al. Delayed gut microbiota maturation in the first year of life is a hallmark of pediatric allergic disease. Nat. Commun. 14, 4785 (2023).
-
Davis, E. C., Monaco, C., Insel, R. & Järvinen, K. M. Gut microbiome in the first 1000 days and risk for childhood food allergy. Ann. Allergy Asthma Immunol.133, 252–261 (2024).
-
Donald, K. & Finlay, B. B. Early-life interactions between the microbiota and immune system: impact on immune system development and atopic disease. Nat. Rev. Immunol. 23, 735–748 (2023).
-
Huda, M. N. et al. Bifidobacterium abundance in early infancy and vaccine response at 2 years of age. Pediatrics143, e20181489 (2019).
-
Huda, M. N. et al. Stool microbiota and vaccine responses of infants. Pediatrics134, e362–e372 (2014).
-
Casaburi, G. et al. Metagenomic insights of the infant microbiome community structure and function across multiple sites in the United States. Sci. Rep. 11, 1472 (2021).
-
Jones, N. 2020 Census Results on Race and Ethnicity. https://www.census.gov/content/dam/Census/newsroom/press-kits/2021/redistricting/20210812-presentation-redistricting-jones.pdf (U.S. Census Bureau, 2020).
-
Breastfeeding Report Card. https://www.cdc.gov/breastfeeding-data/breastfeeding-report-card/?CDC_AAref_Val=https://www.cdc.gov/breastfeeding/data/reportcard.htm (CDC, 2022).
-
Martin, J. A., Hamilton, B. E. & Osterman, M. J. K. Key Findings Data from the National Vital Statistics System. https://www.cdc.gov/nchs/products/index.htm (CDC, 2022).
-
Henrick, B. M. et al. Bifidobacteria-mediated immune system imprinting early in life. Cell 184, 3884–3898.e11 (2021).
-
Olin, A. et al. Stereotypic immune system development in newborn children. Cell174, 1277–1292.e14 (2018).
-
Sarkar, A., Yoo, J. Y., Valeria Ozorio Dutra, S., Morgan, K. H. & Groer, M. The association between early-life gut microbiota and long-term health and diseases. J. Clin. Med.10, 459 (2021).
-
Enav, H., Bäckhed, F. & Ley, R. E. The developing infant gut microbiome: a strain-level view. Cell Host Microbe 30, 627–638 (2022).
-
Bokulich, N. A. et al. Antibiotics, birth mode, and diet shape microbiome maturation during early life. Sci. Transl. Med.8, 343ra82 (2016).
-
Barker-Tejeda, T. C. et al. Comparative characterization of the infant gut microbiome and their maternal lineage by a multi-omics approach. Nat. Commun.15, 3004 (2024).
-
Laursen, M. F. et al. Bifidobacterium species associated with breastfeeding produce aromatic lactic acids in the infant gut. Nat. Microbiol.6, 1367–1382 (2021).
-
Ojima, M. N. et al. Priority effects shape the structure of infant-type Bifidobacterium communities on human milk oligosaccharides. ISME J. 16, 2265–2279 (2022).
-
Saturio, S. et al. Early-life development of the Bifidobacterial community in the infant gut. Int. J. Mol. Sci. 22, 3382 (2021).
-
Wong, C. B., Odamaki, T. & Xiao, J. Insights into the reason of Human-Residential Bifidobacteria (HRB) being the natural inhabitants of the human gut and their potential health-promoting benefits. FEMS Microbiol. Rev. 44, 369–385 (2020).
-
Ho, N. T. et al. Meta-analysis of effects of exclusive breastfeeding on infant gut microbiota across populations. Nat. Commun. 9, 4169 (2018).
-
Matharu, D. et al. Bacteroides abundance drives birth mode dependent infant gut microbiota developmental trajectories. Front. Microbiol. 13, 953475 (2022).
-
Ennis, D., Shmorak, S., Jantscher-Krenn, E. & Yassour, M. Longitudinal quantification of Bifidobacterium longum subsp. infantis reveals late colonization in the infant gut independent of maternal milk HMO composition. Nat. Commun.15, 894 (2024).
-
Martin, R. M. & Bachman, M. A. Colonization, infection, and the accessory genome of Klebsiella pneumoniae. Front. Cell Infect. Microbiol 8, 4 (2018).
-
Freedman, J., Shrestha, A. & McClane, B. Clostridium perfringens enterotoxin: action, genetics, and translational applications. Toxins 8, 73 (2016).
-
Navarro, M. A., McClane, B. A. & Uzal, F. A. Mechanisms of action and cell death associated with clostridium perfringens toxins. Toxins 10, 212 (2018).
-
Ma, Y. et al. Epsilon toxin–producing Clostridium perfringens colonize the multiple sclerosis gut microbiome overcoming CNS immune privilege. J. Clin. Invest. 133, e163239 (2023).
-
Kiu, R. et al. Particular genomic and virulence traits associated with preterm infant-derived toxigenic Clostridium perfringens strains. Nat. Microbiol 8, 1160–1175 (2023).
-
Marcobal, A. et al. Consumption of human milk oligosaccharides by gut-related microbes. J. Agric. Food Chem.58, 5334–5340 (2010).
-
Holmes, I., Harris, K. & Quince, C. Dirichlet multinomial mixtures: generative models for microbial metagenomics. PLoS ONE 7, e30126 (2012).
-
Chen, J. et al. Associating microbiome composition with environmental covariates using generalized UniFrac distances. Bioinformatics 28, 2106–2113(2012).
-
Murata, C. et al. Delivery mode-associated gut microbiota in the first 3 months of life in a country with high obesity rates: a descriptive study. Medicine 99, e22442 (2020).
-
Shao, Y. et al. Primary succession of Bifidobacteria drives pathogen resistance in neonatal microbiota assembly. Nat. Microbiol. 9, 2570–2582 (2024).
-
Jokela, R. et al. A cohort study in family triads: impact of gut microbiota composition and early life exposures on intestinal resistome during the first two years of life. Gut Microbes 16, 2383746 (2024).
-
Ballard, O. & Morrow, A. L. Human milk composition. Pediatr. Clin. North Am. 60, 49–74 (2013).
-
Xiao, L. et al. Human milk oligosaccharides promote immune tolerance via direct interactions with human dendritic cells. Eur. J. Immunol.49, 1001–1014 (2019).
-
Plaza-Díaz, J., Fontana, L. & Gil, A. Human milk oligosaccharides and immune system development. Nutrients10, 1038 (2018).
-
Walsh, C., Lane, J. A., van Sinderen, D. & Hickey, R. M. Human milk oligosaccharides: shaping the infant gut microbiota and supporting health. J. Funct. Foods 72, 104074 (2020).
-
Sela, D. A. et al. The genome sequence of Bifidobacterium longum subsp. infantisreveals adaptations for milk utilization within the infant microbiome. Proc. Natl. Acad. Sci. USA 105, 18964–18969(2008).
-
Lawson, M. A. E. et al. Breast milk-derived human milk oligosaccharides promote Bifidobacterium interactions within a single ecosystem. ISME J. 14, 635–648 (2020).
-
Lordan, C. et al. Linking human milk oligosaccharide metabolism and early life gut microbiota: bifidobacteria and beyond. Microbiol. Mol. Biol. Rev. 88, e0009423 (2024).
-
Thomson, P., Medina, D. A. & Garrido, D. Human milk oligosaccharides and infant gut bifidobacteria: molecular strategies for their utilization. Food Microbiol. 75, 37–46 (2018).
-
Salli, K. et al. Selective utilization of the human milk oligosaccharides 2′-fucosyllactose, 3-fucosyllactose, and difucosyllactose by various probiotic and pathogenic bacteria. J. Agric. Food Chem.69, 170–182 (2021).
-
Marcobal, A. et al. Bacteroides in the infant gut consume milk oligosaccharides via mucus-utilization pathways. Cell Host Microbe 10, 507–514 (2011).
-
LoCascio, R. G., Desai, P., Sela, D. A., Weimer, B. & Mills, D. A. Broad conservation of milk utilization genes in Bifidobacterium longum subsp. infantis as revealed by comparative genomic hybridization. Appl. Environ. Microbiol.76, 7373–7381 (2010).
-
Renwick, S. et al. Modulating the developing gut microbiota with 2’-fucosyllactose and pooled human milk oligosaccharides. Microbiome 13, 44 (2025).
-
You, X., Rani, A., Özcan, E., Lyu, Y. & Sela, D. A. Bifidobacterium longum subsp. infantis utilizes human milk urea to recycle nitrogen within the infant gut microbiome. Gut Microbes 15, 2192546 (2023).
-
Mora, D. & Arioli, S. Microbial Urease in Health and Disease. PLoS Pathog. 10, e1004472 (2014).
-
Konieczna, I. et al. Bacterial urease and its role in long-lasting human diseases. Curr. Protein Pept. Sci. 13, 789–806 (2012).
-
Cantarel, B. L. et al. The Carbohydrate-Active EnZymes database (CAZy): an expert resource for glycogenomics. Nucleic Acids Res. 37, D233–D238 (2009).
-
Kiely, L. J., Busca, K., Lane, J. A., van Sinderen, D. & Hickey, R. M. Molecular strategies for the utilisation of human milk oligosaccharides by infant gut-associated bacteria. FEMS Microbiol. Rev. 47, fuad056 (2023).
-
Wang, H. et al. Integrated 16S rRNA sequencing and metagenomics insights into microbial dysbiosis and distinct virulence factors in inflammatory bowel disease. Front. Microbiol. 15, 1375804(2024).
-
Arredondo-Hernandez, R., Siebe, C., Castillo-Rojas, G., Ponce de León, S. & López-Vidal, Y. The synergistic interaction of systemic inflammation, dysbiosis and antimicrobial resistance promotes growth restriction in children with acute severe malnutrition: An emphasis on Escherichia coli. Front. Antibiot. 1, 1001717 (2022).
-
Shao, Y. et al. Stunted microbiota and opportunistic pathogen colonization in caesarean-section birth. Nature 574, 117–121 (2019).
-
Casaburi, G. & Frese, S. A. Colonization of breastfed infants by Bifidobacterium longum subsp. infantis EVC001 reduces virulence gene abundance. Hum. Micro. J.9, 7–10 (2018).
-
Samarra, A. et al. Maternal-infant antibiotic resistance genes transference: What do we know?. Gut Microbes 15, 2194797 (2023).
-
van Best, N. et al. Bile acids drive the newborn’s gut microbiota maturation. Nat. Commun. 11, 3692 (2020).
-
Wang, Y. et al. Gut microbiota dysbiosis is associated with altered bile acid metabolism in infantile cholestasis. mSystems 4, e00463–19 (2019).
-
Whitfield, K. C. et al. Thiamine deficiency disorders: diagnosis, prevalence, and a roadmap for global control programs. Ann. N.Y. Acad. Sci. 1430, 3–43 (2018).
-
Yu, K. et al. Bacterial indole-3-lactic acid affects epithelium–macrophage crosstalk to regulate intestinal homeostasis. Proc. Natl Acad. Sci. 120, e2309032120 (2023).
-
Meng, D. et al. Indole-3-lactic acid, a metabolite of tryptophan, secreted by Bifidobacterium longum subspecies infantis is anti-inflammatory in the immature intestine. Pediatr. Res. 88, 209–217 (2020).
-
Duar, R. M., Kyle, D. & Casaburi, G. Colonization resistance in the infant gut: the role of B. infantis in reducing pH and preventing pathogen growth. High. Throughput 9, 7 (2020).
-
Henrick, B. M. et al. Elevated fecal pH indicates a profound change in the breastfed infant gut microbiome due to reduction of Bifidobacterium over the past century. mSphere 3, e00041–18 (2018).
-
Measelle, J. R. et al. Thiamine supplementation holds neurocognitive benefits for breastfed infants during the first year of life. Ann. N. Y Acad. Sci.1498, 116–132 (2021).
-
Hill, D. A. & Spergel, J. M. The atopic march. Ann. Allergy Asthma Immunol.120, 131–137 (2018).
-
Leal Rodríguez, C. et al. The infant gut virome is associated with preschool asthma risk independently of bacteria. Nat. Med. 30, 138–148 (2024).
-
Rivera-Calzada, A., Famelis, N., Llorca, O. & Geibel, S. Type VII secretion systems: structure, functions and transport models. Nat. Rev. Microbiol. 19, 567–584 (2021).
-
Bergmann, S. & Hammerschmidt, S. Versatility of pneumococcal surface proteins. Microbiology 152, 295–303 (2006).
-
Raetz, C. R. H. & Whitfield, C. Lipopolysaccharide endotoxins. Annu. Rev. Biochem. 71, 635–700 (2002).
-
Stewart, C. J. et al. Temporal development of the gut microbiome in early childhood from the TEDDY study. Nature 562, 583–588 (2018).
-
Arrieta, M.-C. et al. Early infancy microbial and metabolic alterations affect risk of childhood asthma. Sci. Transl. Med. 7, 307ra152 (2015).
-
Gellman, R. H. et al. Hadza Prevotella require diet-derived microbiota-accessible carbohydrates to persist in mice. Cell Rep. 42, 113233 (2023).
-
Moraïs, S. et al. Cryptic diversity of cellulose-degrading gut bacteria in industrialized humans. Science 383, eadj9223 (2024).
-
Felix, K. Trim Galore. https://www.bioinformatics.babraham.ac.uk/projects/trim_galore/ (Babraham Institute).
-
Lu, J., Breitwieser, F. P., Thielen, P. & Salzberg, S. L. Bracken: estimating species abundance in metagenomics data. PeerJ Comput. Sci. 3, e104 (2017).
-
Wood, D. E., Lu, J. & Langmead, B. Improved metagenomic analysis with Kraken 2. Genome Biol. 20, 257 (2019).
-
Almeida, A. et al. A unified catalog of 204,938 reference genomes from the human gut microbiome. Nat. Biotechnol.39, 105–114 (2021).
-
Ondov, B. D. et al. Mash: fast genome and metagenome distance estimation using MinHash. Genome Biol. 17, 132 (2016).
-
Buchfink, B., Xie, C. & Huson, D. H. Fast and sensitive protein alignment using DIAMOND. Nat. Methods 12, 59–60 (2015).
-
Lombard, V., Golaconda Ramulu, H., Drula, E., Coutinho, P. M. & Henrissat, B. The carbohydrate-active enzymes database (CAZy) in 2013. Nucleic Acids Res. 42, D490–D495 (2014).
-
Liu, B., Zheng, D., Zhou, S., Chen, L. & Yang, J. VFDB 2022: a general classification scheme for bacterial virulence factors. Nucleic Acids Res. 50, D912–D917 (2022).
-
Aitchison, J. The statistical analysis of compositional data. J. R. Stat. Soc. Ser. B Stat. Methodol. 44, 139–160 (1982).
-
Martín-Fernández, J. A., Barceló-Vidal, C. & Pawlowsky-Glahn, V. Dealing with zeros and missing values in compositional data sets using nonparametric imputation. Math. Geol. 35, 253–278 (2003).
-
Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B 57, 289–300 (1995).
-
Harlow, T. J., Gogarten, J. P. & Ragan, M. A. A hybrid clustering approach to recognition of protein families in 114 microbial genomes. BMC Bioinform. 5, 45 (2004).
-
Van Dongen, S. Graph clustering via a discrete uncoupling process. SIAM J. Matrix Anal. Appl. 30, 121–141 (2008).
-
Enright, A. J. An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res. 30, 1575–1584 (2002).
Author information
Authors and Affiliations
Contributions
J.B.J. contributed to study design, study management, data analysis, and manuscript preparation. P.J.T. contributed to bioinformatics pipeline development, study management, data analysis and manuscript preparation. S.S. contributed to manuscript preparation, study management and data analysis. H.S. contributed to bioinformatics pipeline development and data analysis. C.S., A.L., S.P., and N.L.J. contributed to sample preparation and sequencing. R.H. contributed to study design. R.I. contributed to data analysis and manuscript preparation. S.V.D. contributed to study design, study management, data analysis, and manuscript preparation. S.J.C contributed to study design, study management, data analysis, and manuscript preparation.
Corresponding author
Ethics declarations
Competing interests
Research was funded by Persephone Biosciences. All authors were employed by and/or hold stock in Persephone Biosciences. RI also holds stock in Kenvue.
Peer review
Peer review information
Communications Biology thanks M. Andrea Azcarate-Peril, Martin Frederik Laursen and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary Handling Editors: Sabina Leanti La Rosa and Tobias Goris.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Jarman, J.B., Torres, P.J., Stromberg, S. et al.Bifidobacterium deficit in United States infants drives prevalent gut dysbiosis. Commun Biol 8, 867 (2025). https://doi.org/10.1038/s42003-025-08274-7
-
Received
-
Accepted
-
Published
-
DOIhttps://doi.org/10.1038/s42003-025-08274-7
0 comments