Multipathogen protein microarray, cohort, and sample characteristics. The multipathogen array was developed at Antigen Discovery Inc. using an established high throughput cloning and protein expression system (Figure 1). Arrays were designed to include approximately 80 proteins from each of the enteric, respiratory, and sepsis-related pathogens responsible for most morbidity and mortality from infectious diseases in children under 5 years, as well as 5–15 proteins from other pathogens relevant to global health. Selection of proteins followed 2 approaches: (a) an empirical approach utilizing the databases from prior studies performed at Antigen Discovery Inc., and (b) a hypothetical approach using in silico prediction of antigenic targets and orthologues of confirmed antigenic targets already identified in ADI databases. Proteins were selected for inclusion based on seroprevalence rates and correlation with exposure to pathogens, or where limited data were available, homology with other antigens. The final multipathogen protein microarray included 1,607 proteins from 30 different pathogens (Table 1).

Figure 1 Multipathogen protein microarray principle. Open reading frame (ORF) expression clone libraries can be constructed from any genome sequence and corresponding source of genomic DNA using high-throughput PCR/recombination cloning. Proteins encoded by the cloned ORF plasmids are expressed using a cell-free in vitro transcription/translation (“IVTT”) system. Each protein is expressed and printed individually onto microarray slides. With as little as 2–5 μL of serum and 100 μL of defatted human milk, complete or partial proteomes can be screened for antibody binding. Isotype-specific bound antibodies are detected using a fluorescently labeled secondary antibody. Using a fluorescence microarray scanner, signal intensities from protein microarrays are acquired and checked for quality, followed by statistical analysis.

Table 1 Multipathogen protein microarray selection of pathogens and protein spot allocation

In total, 878 human milk samples (67 [7%] colostrum and 811 [93%] mature milk), which were collected from 695 women in Finland, U.S., Pakistan, Peru, and Bangladesh were assessed (Figure 2 and Supplemental Table 1; supplemental material available online with this article; https://doi.org/10.1172/JCI168789DS1). In addition, 94 matched maternal serum samples came from 60 12-week postpartum Bangladeshi mothers and 34 6-week postpartum Peruvian mothers.

Figure 2 Flowchart of samples utilized in the study. Human milk samples from 6 cohorts in 5 countries were available for assessment of antibody profiles utilizing a protein microarray. As a validation cohort, we sourced an independent set of samples from a mother-infant paired birth cohort from Bangladesh (Cryptosporidium Burden Study) to probe against a mini protein microarray. All the samples available from the smaller studies were assayed at desired time points, and for larger cohorts (200 or more samples), approximately one-third of the samples were selected as representative of the whole cohort or based on the case-control design to include positive and negative infant infectious outcomes where available.

Profiles of human milk IgA and IgG antigen binding differ by economic and geographic region. First, to assess the overall differences between economic regions in IgA and IgG antibody repertoires, principal component analysis was performed separately on colostrum samples and mature milk samples for both IgA and IgG, and the responses to enteric, respiratory, and bloodborne pathogen proteins were analyzed independently (Figure 3). IgA responses in colostrum and mature milk between LMIC and HIC populations were most clearly delineated for enteric pathogens. Mature milk IgA responses against respiratory and sepsis-related pathogens were also significantly different between LMICs and HICs. IgG responses against all 3 types of pathogens clustered tightly by economic classification for both colostrum and mature milk. PCA results by individual countries are shown in Supplemental Figure 1. These data further indicate differences in antibody profiles within the HIC and LMIC cohorts.

Figure 3 Principal component analysis by economic classification for colostrum and mature human milk. The scatter plots show principal component (PC) values for each individual’s IgA and IgG responses as points colored by economic region. The top row of plots show responses against enteric pathogen antigens, the second row for respiratory pathogen antigens, and the third row for sepsis-related pathogen antigens. The 2 leftmost columns of plots show colostrum IgA and IgG responses, and the 2 rightmost columns show mature milk IgA and IgG responses. Samples from HICs are shown in orange points, and samples from low- and middle-income countries LMICs are shown in blue points. Colostrum was available from Finland (n = 15) and Pakistan (n = 49); 1 mature milk sample per mother was available from Finland (n = 85), Rochester, New York, U.S. (n = 23), Peru (n = 34), Bangladesh (n = 246), and Pakistan (n = 49). t test P values for PC comparisons between HIC and LMIC are shown in captions below each plot. The 2 PCs with the lowest P values (PC1, PC2 or PC3) for each comparison were plotted.

To understand the potential difference in per-pathogen antigenic coverage between HIC and LMIC, an IgA and IgG antibody “breadth score” for each of the enteric, respiratory, and sepsis pathogens were assessed. Breadth score was calculated as the sum of seropositive responses (normalized signal intensity ≥ 1.0) per pathogen divided by the total number of probes for the corresponding pathogen, i.e., the proportion of positive probes. Comparisons of the distributions of antibody breadth scores between LMIC and HIC populations for enteric, respiratory, and sepsis-related pathogens are summarized in Figure 4. Mature milk IgA and IgG from mothers in LMIC was reactive to a higher number of enteric antigens (except for cholera) than that from HICs (Figure 4). Whereas IgG breadth scores were higher for several enteric pathogens in LMICs than HICs, colostrum IgA breadth scores were comparable. Further comparisons of IgA breadth scores between countries showed that women in Finland did not significantly differ from those in the U.S., but were most different from those in Bangladesh, followed by Pakistan and then Peru, which were all similar (Supplemental Figure 2). Pair-wise comparison of mature milk IgG between geographic regions showed findings similar to IgA, with Shigella being the most notable pathogen with a higher breadth score in the LMIC than HIC populations (Supplemental Figure 3).

Figure 4 Pathogen-specific IgA and IgG breadth scores in mature milk and colostrum by economic classification. The box plots show comparisons of mature milk and colostrum IgA and IgG breadth scores (row headers), defined as the proportion of seropositive (normalized signal ≥ 1.0) antigens per pathogen for each individual (e.g., 20 of 80 positive responses = 0.25 breadth score). The column headers indicate the type of pathogens displayed in each column: enteric, respiratory, and sepsis-related pathogens. Rotavirus and Adenovirus 40/41 were omitted from the enteric pathogens column due to low numbers of reactive antigens. The x axes show each pathogen grouped by HIC (orange boxes) and LMIC (blue boxes) classifications. The y-axes show the IgA or IgG breadth scores on a logarithmic scale with the boxes representing the median and interquartile range. Significant differences by Wilcoxon’s rank sum tests are shown by blue asterisks below each pathogen: *P ≤ 0.05, ** P ≤ 0.005, ** P* ≤ 0.0005. Mature milk samples (n = 438) were included from the latest sample collection for each cohort and were at least 6 weeks postpartum. Colostrum samples (n = 64) were collected in the first 5 days. E. coli, diarrheagenic types EAEC, EPEC and ETEC; RSV, respiratory syncytial virus; GBS, group B Streptococcus.

Among the respiratory pathogens, LMICs had significantly higher mature milk IgA breadth scores for influenza A/B, Bordetella pertussis, pneumococcus, and Mtb, while the IgG breadth scores in the mature milk or colostrum were higher for influenza A/B, and particularly for pneumococcus and RSV (Figure 4). Pair-wise comparisons between countries did not show differences for mature milk IgA breadth scores (Supplemental Figure 4), but women from Bangladesh and Pakistan both had higher IgG breadth scores for pneumococcus and RSV than Finland and the U.S. (Supplemental Figure 5). Peru had pneumococcal breadth scores similar to the HICs.

There were few notable differential breadth scores among the sepsis-related pathogens (Figure 4). Mature milk from women from LMICs had higher IgA and IgG breadth scores to Klebsiella pneumoniae proteins, although IgG breadth scores were extremely low for both LMICs and HICs. Sepsis pathogen breadth scores were comparable between countries (Supplemental Figures 6 and 7). All statistical results are available in Supplemental Tables 1–3.

In summary, these data show that, compared with HIC, IgA and IgG in milk of women in LMICs are reactive to a higher number of antigens per pathogen for all enteric pathogens tested. LMIC women’s milk antibodies are also more reactive to a higher number of antigens from respiratory pathogens, but the reactive isotype differs between pathogens with IgG being reactive to more numerous RSV and pneumococcus antigens. HIC mothers do not have antibodies that are reactive to a higher number of antigens from any pathogens.

Antibody responses in human milk are distinct from maternal serum responses. To compare the mucosal and the systemic antibody responses across enteric, respiratory, and sepsis pathogens, paired milk and serum samples available from Peru and Bangladesh cohorts were assessed for IgA and IgG responses (Figure 2 and Supplemental Table 1). We first calculated the total number of reactive antigens of IgA and IgG for each specimen type (human milk or serum) (Figure 5). In general, Shigella, EPEC, pneumococcus, and Staphylococcus elicited both IgA and IgG responses in serum and milk; typically, serum antibodies were reactive to more antigens than those in milk and more reactive for IgG than for IgA. However, Cryptosporidium, Campylobacter jejuni, Klebsiella pneumoniae, GBS, and Acinetobacter baumannii had more antigens reactive to serum IgA than IgG responses, and among those, Campylobacter jejuni, Klebsiella pneumoniae, and Acinetobacter baumannii elicited a predominant IgA response in both milk and serum. Milk IgG responses were most rare and were not elicited at all for several pathogens, including Salmonella, Cryptosporidium, Campylobacter jejuni, adenovirus, pertussis, Mtb, Klebsiella pneumoniae, and Acinetobacter baumannii, or they were relatively poor, as in the case of ETEC and EAEC. There were no pathogens that generated both an IgA and IgG antibody response in human milk but not in serum. The viruses, adenovirus 40 and 41, influenza A and B, rotavirus A and RSV, elicited predominantly serum IgG and IgA responses.

Figure 5 Comparison of antigen-specific recognition of IgA and IgG for enteric, respiratory, and sepsis pathogens in human milk and serum. The horizontal bar plots show the number of antigens bound by IgA for each pathogen by sample type and antibody isotype. Pathogens are shown on the y axis, grouped by disease category, with the total number of antigens (“Ag”) present on the multipathogen protein microarray in parentheses. The x axis shows the number of antigens from each pathogen that were reactive. Reactive antigens were defined as antigens with median IgA concentrations of at least 1.0 in normalized signal intensity. Only samples from the Peru and Bangladesh (MDIG) cohorts, which had paired serum and human milk samples at 12 weeks or later postpartum (n = 93 participants; 1 of the 94 participants with paired serum and milk samples did not have a later mature milk sample), were included in this analysis. Note that the total number of antigens differs between the pathogens and therefore the number of reactive antigens does not reflect relative reactivities between the pathogens.

Additionally, the overlap in reactivity for each specific antigen across specimen types and isotypes was tallied and visualized using “upset” plots (Supplemental Figure 8 and Supplemental Table 4) (31). Shigella spp., EPEC, pneumococcus, Staphylococcus, and GBS had the most numerous IgA- and IgG-reactive antigens shared between serum and milk. There were no pathogens where an antigen would have elicited all but serum IgG responses or all but serum IgA responses. While there are several pathogens where a serum-only response without milk response was seen, only EAEC and Salmonella had some antigens that reacted only with milk IgA and no pathogens where only a milk IgG response was seen. Cryptosporidium spp., Campylobacter jejuni, Klebsiella pneumoniae, and Acinetobacter baumannii had a predominant IgA response in both milk and serum, with most milk IgA-reactive antigens also being reactive to serum IgA, but with some antigens uniquely recognized by serum IgA. Adenovirus and influenza tended to have antigens uniquely reactive to serum IgG or serum IgA.

In summary, these data suggest differences between pathogens in eliciting IgA versus IgG antibody responses, which are likely in part due to the invasiveness of the pathogen, with Shigella, EPEC, pneumococcus and Staphylococcus as examples of those with broad antibody responses across both isotypes in milk and serum, whereas Cryptosporidium and Campylobacter elicited predominantly an IgA response.

The magnitude of human milk IgA and IgG responses is regulated by the duration of lactation, economic and geographic region, total immunoglobulin concentrations, BMI, and parity. To assess the factors associated with antibody concentrations, we first analyzed the effect of duration of lactation utilizing longitudinal human milk samples available from the Finland, Pakistan, and Peru cohorts (Figure 2 and Supplemental Table 1). Total and specific IgA antibody concentrations decreased significantly over the first 12-to-14 weeks of lactation in each cohort (Figure 6, A and B). Utilizing samples from Pakistan, we were able to establish that there was a significant decline in the first 6 weeks but not weeks 6 to 14 for both total IgA (P < 1 × 10–11 versus P = 0.9) and a significant decrease in aggregate specific IgA for 284 antigens (P < 1 × 10–5 versus P = 0.002). The comparison of concentrations of specific antibody responses between time points is further illustrated in Figure 6, C–E. Compared to IgA, total and antigen-specific concentrations of IgG were lower throughout lactation, and there were fewer shifts in IgG concentrations (Supplemental Figure 9). In samples from women in Pakistan, there was only a small, although significant, decrease in the concentrations of both total and specific IgG from 0 to 6 weeks (P = 0.003 and P = 0.02, respectively). In Finland and Peru, there were no significant changes in total IgG or specific IgG concentrations.

Figure 6 Total IgA and pathogen-specific IgA concentrations decline from colostrum to mature human milk. (A and B) The line plots show (A) μg/mL of total IgA in human milk and (B) the mean Log 2 signal intensity of IgA antibodies specific for 294 reactive pathogen antigens on the multipathogen protein microarray over 12 to 14 weeks postpartum in Finland (n = 15 subjects), Pakistan (n = 49 subjects), and Peru (n = 9 subjects). The vertical bars represent the SEM. Paired t test P values are shown between time points and colored according to cohort. (C–E) The volcano plots show the difference between pathogen-specific IgA concentrations between time points for (C) Finland and (D and E) Pakistan. Comparison of samples from Peru is not shown due to low number of week 0 colostrum samples (n = 3). Each marker represents an antigen on the multipathogen protein microarray; red open triangles represent IgA responses to individual antigens that are significant after correction for the FDR and black open circles represent IgA responses to individual antigens that were not statistically significant. The x axes show mean differences between time points, and the y axes show the inverse Log 10 P value from paired t tests.

To assess the effect of economic and geographic regions on the magnitude of antibody responses we performed group-wise comparisons of the normalized signal intensities of antibody binding to each protein (Figure 7). LMIC cohorts had higher IgA and IgG antibody concentrations in both mature milk and colostrum most notably for IgA and IgG to Shigella and diarrheagenic E. coli, and IgG to pneumococcal and staphylococcal proteins compared with HIC cohorts. In turn, HIC populations had higher responses in mature milk to a few individual antigens from Staphylococcus.

Figure 7 Pathogen-specific IgA and IgG concentrations in mature milk and colostrum by economic region. Scatter plots show IgA or IgG concentrations for each of the reactive antigens from enteric, respiratory, and sepsis-related pathogens (row headers) in mature milk and colostrum (column headers). Antigens classified as “reactive” were those having a median value ≥ 1.0 across the entire study population. Each point represents the mean normalized signal intensity for an individual antigen, colored by pathogen (row legends); solid triangles represent antigens with significant differential reactivity between cohorts by t tests after correction for the FDR. Y axes show means for samples from LMICs, and x axes show means for samples from HICs. The countries included in each plot are listed along the y-axis (LMICs) and x-axis (HICs) with sample sizes in parentheses. The solid diagonal line represents the line of identity (i.e., similar mean signal intensity between LMICs and HICs). Mature milk samples were those from the latest sample collection for each cohort and were at least 6 weeks postpartum. Colostrum samples were collected at 0 weeks. E. coli, diarrheagenic types EAEC, EPEC and ETEC; RSV, respiratory syncytial virus; GBS, group B Streptococcus.

Pairwise comparisons between countries for enteric pathogen proteins further demonstrate the similarities and differences within and between HIC and LMIC populations with key differences driven by Shigella for IgA and IgG, and E. coli, Cryptosporidium, and cholera for IgA (Supplemental Figures 2 and 3). For the respiratory pathogens, the differences were largely driven by pneumococcal, Klebsiella, GBS, and Mtb antigens for IgA and pneumococcal antigens for IgG (Supplemental Figures 4 and 5). Regarding sepsis pathogens, HICs and LMICs differed for staphylococcal IgG antibody concentrations (Supplemental Figures 6 and 7).

Lastly, we utilized mature human milk samples to understand the effect of maternal factors such as parity, nutritional status, age, and education on IgA and IgG antibody concentrations against all pathogen proteins on the multipathogen protein microarray using multivariable linear mixed effects regression and ordinary least squares (OLS) regression (Supplemental Figures 10 and 11). We used mature milk samples because the concentrations stabilized after 6 weeks of lactation (Figure 5). Besides economic region, human milk total IgA and BMI were negatively associated with aggregate IgA antibody concentrations (i.e., mean of reactive antigens), although the effect size for human milk total IgA was small (Table 2). For IgG antibodies, parity was negatively associated and total IgG positively associated with the IgG antibody concentrations (Supplemental Table 2). Data on maternal age and highest level of education were only available in LMICs and showed no associations with antibody concentrations. In summary, besides the differences by economic regions and decrease over the duration of lactation, human milk antibody concentrations were lower in mothers with high BMI and parity.

Table 2 Biological and environmental predictors of IgA antibody concentrations against all pathogen proteins on the multipathogen protein microarray in mature human milk

Human milk IgA antibodies are associated with protection against rotavirus but greater risk of Campylobacter infection. Infection and disease outcomes in breastfed infants were available for the 2 Bangladesh cohorts. For the PROVIDE cohort, human milk samples collected at approximately 1 week postpartum from women whose infants had at least 1 episode of enteric illness were analyzed for association of antibodies with enteric pathogen outcomes during the first year of life. In the MDIG cohort, human milk collected at approximately 13 weeks postpartum from women whose infants had at least 1 nasal swab collected after the human milk sample was taken, as well as evidence of partial, predominant, or exclusive breastfeeding at least until the time of the index swab collection, were analyzed for association with subsequent influenza and RSV infections through the first 6 months of life. Importantly, samples from the MDIG study were selected and stratified into 2 equally sized groups on the basis of whether the infant had a nasal swab collected that tested positive for RSV and/or influenza A and/or influenza B after collection of the human milk sample. Among infants in the cohort with one or more microbiologically confirmed nasal swab for RSV and/or influenza A/B, mother-infant pairs were preferentially selected for this case-control study if the swab tested positive for RSV and if the cases/episodes had a more severe clinical presentation (i.e., lower respiratory tract infection). Controls were mother-infant pairs where the infant had any type of clinical acute respiratory infection for which the nasal swab tested negative for RSV and influenza A/B after collection of the human milk sample. Pathogens in both the PROVIDE and MDIG studies were detected by PCR-based assays. Additionally, in the PROVIDE cohort, infections were classified as causative of disease or not. Analysis of IgA response on the odds of corresponding pathogen infection by logistic regression showed few significant associations among diarrheal and respiratory pathogens (Figure 8A), although a general trend was observed for higher antibodies in human milk associated with infants that had subsequent enteric infections. Because there was a long period of follow up between early human milk sampling and illness outcome in the PROVIDE samples (up to the second half of the first year of life for some) (Figure 8B) and because it was not possible to distinguish infants without infection as those resistant to the pathogen or those that remained unexposed, we performed an exploratory analysis to assess if there were associations between human milk antibodies and time until infection with enteric pathogens or attributable illness. This approach has been useful in other disease models with heterogeneous exposure in the population (32). We used multivariable Cox proportional hazards (CPH) models to explore associations of antibodies and the hazard function, correcting for sex, days of exclusive breastfeeding, the WHO child growth standard length-for-age z-score (LAZ) at enrollment, parity, maternal age, maternal BMI, maternal education in years, household income/expenditures, and ordinal category of improved drinking water treatment. For each antigen, IgA and IgG responses were categorized by the median as top or bottom-half responders (33). Among those antigens for which positive IgA responses were seen in at least 10% of women, there was a trending association between higher human milk IgA responses against rotavirus, adenovirus 40/41, and Shigella and a lower incidence of specific infection in the infant, i.e., the most significant associations had hazard ratios below 1, indicating a protective effect (Figure 8C). After correction for the FDR, the correlation for 3 Rotavirus A antigens remained significant: the top half of IgA responders to the VP4 outer capsid protein, nonstructural protein 5, and VP1 RNA-directed RNA polymerase (RdRp) had infants with a significant delay in time to infection. An example Kaplan-Meier plot for Rotavirus A VP4 is shown in Figure 8D. IgA responses to these Rotavirus A proteins, adenovirus 40/41, and Shigella antigens were not significantly associated with a delay to diarrheal disease caused by these pathogens in the infant (Supplemental Figure 12, A and B), with the exception of Rotavirus A VP4, which was similarly associated with reduced time to Rotavirus A–attributable diarrhea before correction for the FDR (Supplemental Figure 12C). In contrast, the top half of Campylobacter IgA responders had a child with a higher risk of infection and diarrheal disease caused by Campylobacter, and similar trends were observed for EPEC infection and ETEC-attributable diarrhea; however, these were not significant after correction for the FDR (Figure 8C and Supplemental Figure 12). An example Kaplan-Meier plot for specific IgA to Campylobacter jejuni Cj0596 major antigenic peptide PEB-cell binding factor, also known as PEB4, is shown in Figure 8E for PCR-confirmed infection and Supplemental Figure 12D for attributable disease. Inclusion of all antigens in the models, even those for which less than 10% of women had an IgA response, showed a similar finding that higher human milk IgA concentrations against Rotavirus A proteins were significantly associated with a delayed time to infection in the infant, and higher IgA concentrations against Campylobacter proteins were significantly associated with a reduced time to infection, even after correction for the FDR (Supplemental Figure 13). For comparison, we performed CPH models on all the 256 human milk samples irrespective of whether infants had confirmed exposures or not, and, like the logistic regression models in Figure 8A, the strongest associations for IgA antibody concentrations were with increased risk of infection and diarrheal illness (Supplemental Figure 14).

Figure 8 Association of human milk IgA with infection in breastfed infants. (A) Association of IgA binding to each pathogen for infants subsequently infected with the specific pathogen or not. The log odds from logistic regression of enteric (left) or respiratory (right) infection with increasing IgA binding (x axis) in infants during 1 year and 6 months of follow up, respectively, are shown with the inverse log 10 P value (y axis). Associations significant after correction for the FDR are shown in colored triangles. Samples analyzed for diarrheal illness were from the Bangladesh PROVIDE cohort (n = 256) and for respiratory illness were from the Bangladesh MDIG cohort (n = 246). (B) Survival curves of 256 infants from the PROVIDE cohort for enteric pathogens detected by PCR. (C) Hazard ratios of infants during the first year of life divided into the top and bottom halves of mothers’ milk IgA responses for each antigen that was reactive in at least 10% of PROVIDE cohort women. Milk samples included were from mothers with infants that subsequently had pathogen-specific infection. Values below 1.0 represent lower risk of infection in the top half of milk IgA responses compared with the bottom half. For unadjusted P values less than 0.05, antigens were colored (otherwise grey), FDR-adjusted P values less than 0.05 were plotted as triangles. (D and E) Representative Rotavirus A antigen (D) and Campylobacter jejuni antigen (E) corresponding to the samples included in the models shown in the volcano plot (C). The risk tables show the number at risk during 100-day intervals. The Rotavirus A VP4 outer capsid protein is representative of antibodies associated with longer time to infection, while the C. jejuni PEB4 major antigenic peptide (Cj0596) represents antibodies associated with a shorter time to infection. HR, Cox model coefficient for the hazard ratio; CI, confidence interval; P, log-rank test P value; FDR P, adjusted P value.

To assess generalizability of the finding that milk IgA responses were associated with delayed or shorter time to infection, we sourced an independent set of samples from a mother-infant paired birth cohort in the same region of Bangladesh (Cryptosporidium Burden Study, CBS (34)). These human milk samples were tested on a mini-protein microarray that included the Rotavirus A, Shigella, and Campylobacter proteins identified from the PROVIDE study as correlates of delay or reduction in time to infection or diarrheal illness (unadjusted P values < 0.05 in CPH models) (Figure 8C and Supplemental Figure 13). The concentration of IgA to Rotavirus A VP4 outer capsid protein was associated with a delayed time to infection in the infant (HR: 0.61, CI: 0.38–0.98, P = 0.042, P FDR = 0.2, n = 92 cases,), and the concentration of IgA to Campylobacter jejuni PEB4 major antigenic protein was associated with a reduced time to infection in the child (HR: 1.79, CI: 1.2–2.7, P = 0.005, P FDR = 0.08, n = 116 cases) (Figure 9).