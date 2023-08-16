Identification of immunoreactive proteins. The Cryptosporidium species protein microarray created for this study comprised a total of 1,761 antigens representing 1,250 unique genes from C. parvum (n = 980), C. hominis (n =263), and C. meleagridis (n =7). C. parvum sequences were used as the backbone of the protein microarray due to the ready availability of C. parvum DNA and its superior assembly and annotation (12, 13) (Supplemental Figure 2). The selected C. parvum proteins included those that had been previously identified as potential vaccine candidates (Supplemental Table 1) (6, 14–34). Open reading frames (ORFs) over 3,000 base pairs were cloned as overlapping segments to optimize in vitro translation. The array also included 15 genetically variant regions of the gp60 gene common in this Bangladeshi population (35). Proteins with conserved sequences in the different Cryptosporidium species that account for the majority of cryptosporidiosis in humans (C. hominis, C. parvum and C. meleagridis) and that were annotated as having a signal peptide — and, thus, potentially a membrane protein and accessible to human antibodies — were prioritized for inclusion (Supplemental Figure 3).

The array was incubated with a 1:100 dilution of plasma collected at 1 year of age from 500 children in the cohort and developed with anti-human IgG (DyLight650, Bethyl Laboratories) and Cy3 AffiniPure F(ab′) 2 or anti-human IgA. The distribution of normalized fluorescence signal intensity (SI) values of each antigen was analyzed using a mixture modeling technique Supplemental Figure 4A) to identify the antigen specific background component of the spot signal and hence the appropriate intensity, or cut-off value, that was defined as seropositive (Supplemental Figure 4B).

Antigens were classified as seroreactive in this population if at least 10% of the children had either IgA (Figure 1A) or IgG (Figure 1B) antibodies against the antigen. Using this criterion, 36 antigens were seroreactive to both IgG and IgA (Supplemental Figure 5), 57 antigens by IgA alone, and 140 antigens by IgG alone (Figure 1C). Antigens recognized varied greatly between children, with each antigen recognized by only a subset of the responding infants (Figure 2).

Figure 1 Humoral immunity to Cryptosporidium antigens was isotype specific. Immune responses are shown for (A) IgA and (B) IgG antibodies. The Y-axis is signal intensity after normalization and the X-axis shows Cryptosporidium antigens ranked by median signal intensity. Bars represent the interquartile range of each antibody response and are shown as red if antibodies were present in ≥ 10% of infants (seroprevalent). (C) The Venn diagram shows seroprevalent Cryptosporidium antigens with IgA- (green) and IgG-specific (orange) and overlapping immune responses.

Figure 2 Cryptosporidium antigens recognized by IgA and IgG antibodies. The proteomic microarray was used to measure the parasite-specific antibody response in the infants enrolled in our study cohort at 1 year in age. Previously infected children (columns) and the Cryptosporidium antigens (rows) that stimulated a strong IgG and/or IgA antibody response (present in > 10% of the children; n =232 antigens) are shown. The spot signals were normalized by first determining the specific background component by use of mixture models and setting this value to 0. Bar at the top of each heat map indicates the total number of Cryptosporidium antigens each child responds to (Antibody Breadth). The side bars indicate: (a) the seroprevalence of each antigen (% Sero+) and (b) presence of a membrane-targeting signal peptide (SP).

We found that orthologues encoded by C. parvum and C. hominis generated similar signals; for example the C. parvum Cp23 (cgd4_3620) and the C. hominis Cp23 orthologue (Chro.40414) signals were correlated (Pearson r value 0.844 P = 2.11 × 10–119). In total, there were 124 IgG-reactive proteins and 70 IgA-reactive C. parvum proteins. Antigens recognized were from multiple developmental stages of the Cryptosporidium parasite (36–38) (Supplemental Figure 2).

Humoral immune response diminished with time from infection. The anti-Cryptosporidium IgA and IgG antibody profiles were analyzed using t-distributed stochastic neighbor embedding (t-SNE) as an unsupervised data reduction method for visualization of the trends in the antibody profile, based on the number of days since the first Cryptosporidium-positive diagnostic qPCR and whether the child had a documented prior Cryptosporidium infection (Figure 3A). The antibody profile of children with more recent infections mapped to a distinct region within the t-SNE plot (Region 1, “R1”), while children with earlier infections did not cluster separately from children without a prior infection (Figure 3A). By analyzing the 100 most responsive antigens, the important factors that influenced the antibody profile of children in R1 of the t-SNE plot were the strength of the immune response (Figure 3B) and its breadth, or the number of parasite antigens recognized by IgA and IgG (Figure 3C) (39). Diminishing antibody responses over time were confirmed by a linear regression analysis where, most notably, the breadth of both the IgA and IgG anti-Cryptosporidium immune responses significantly decreased over time (Figure 3D).

Figure 3 Antibody responses waned with time after a Cryptosporidium infection. (A) The t-SNE plot identified a subset of children with a similar antibody profile. Each point corresponds to the immune profile of a child. Gray squares indicate children where no previous Cryptosporidium infections were identified by qPCR in clinical or surveillance stool samples (“qPCR–”), and orange circles represent children that had previous infections detected by qPCR (“qPCR+”), with the intensity of the overlaid color indicating the days since the last Cryptosporidium qPCR+ stool sample was identified. A group of infants had similar antibody profiles and a high density of recent infections (R1). (B) The split violin plot of antibody signals against the 100 most-reactive antigens (Y-axis) for each isotype (X-axis) shows the responses of children within the R1 region of the t-SNE plot compared with the remainder of the samples in R2. The median and quartile values are shown as horizontal lines in each split violin. (C) The split violin plot shows the same comparison as (B) using the antibody breadth (count of seropositive responses) among the 100 most-reactive antigens. P values above each split violin were calculated using linear mixed effects regression (LMER) and Wilcoxon’s rank sum tests for (B and C), respectively. (D) Antibody breadth among the 100 most-reactive antigens for each isotype is shown on the Y-axis after log 10 transformation with the interval (days) between the last Cryptosporidium qPCR+ diagnostic assay and the time of antibody measurement shown on the X-axis. Linear regression P values and R2 values are shown for IgG and IgA, as well as a line and confidence intervals (colored bands; pink for IgA and green for IgG) fit to each. (E and F) PLS-DA is shown for IgA and IgG responses respectively. Each point corresponds to the immune profile of a child. The purple circles indicate the antibody response obtained from plasma that was collected from children where none of the stool samples (diarrheal or surveillance) collected during the first year of life, prior to the plasma sampling time point, were ever qPCR positive for Cryptosporidium parasites (“Yr0-1 qPCR–”). Green triangles indicate that the child had a verified Cryptosporidium subclinical or symptomatic infection (“Yr-0-1 qPCR+”). The percentage of the variation in the child’s antibody profile accounted for by each axis is indicated.

Impact of prior infection on antibody response. While the effect of time since Cryptosporidium infection on antibody levels was significant, it did not completely explain the failure to generate an anti-Cryptosporidium immune response in all cases. To focus on the impact of prior exposure to the Cryptosporidium parasite, the data were analyzed using partial least squares discriminant analysis (PLS-DA) (Figure 3, E and F). This analysis demonstrated that a subset of children with prior infection differed from the population of children with no prior infection, but that a substantial proportion of children with an earlier infection had antibody profiles similar to children without a documented prior infection. We concluded that some infections in this cohort were missed, despite the active surveillance system in place.

To explore whether malnutrition or inflammation impacted the humoral immune response, we examined whether the immune profile correlated with growth failure — a measure of chronic malnutrition measured by child height-for-age Z scores (HAZ) — or biomarkers of systemic and local inflammation (sCD14, IL-1Beta, CRP), or immunoregulatory cytokines (IL-4), but no correlations were observed (Supplemental Figure 6).

Protection from reinfection was not associated with the breadth of the antibody response, i.e., the number of antigens recognized by a given child. Children for whom a previous infection had been identified had a greater number of parasite antigens recognized by IgA and IgG, or greater “breadth” (Figure 4A). However, no association was found between the breadth of the antibody response and resistance to reinfection using either a data set restricted to the infants with a qPCR-verified Cryptosporidium infection prior to year 1 (Figure 4B) or using the data from the entire study cohort (Figure 4C).

Figure 4 The breadth of the anti-Cryptosporidium immune response was not correlated with protection from infection. (A) Split violin plot of antibody breadth in plasma among the 100 most-reactive antigens (Y-axis) for each isotype (X-axis) is shown for the comparison between children that had no stool samples (diarrheal or surveillance) qPCR+ for Cryptosporidium parasites (purple) and children who had a verified Cryptosporidium infection (green). (B) Data is shown from one year old infants who had prior qPCR-confirmed Cryptosporidium infections (“Yr0-1 qPCR+”) that were subsequently uninfected (blue) or reinfected (orange) during the next 2 years. (C) Data is shown from 1-year-old infants that included both the immunologically naive infants with no prior Cryptosporidium infections detected by qPCR in stool samples (diarrheal or surveillance) as well as those with qPCR+ stool samples during the first year of life. Medians and quartiles are indicated by horizontal lines in each split violin. Significant P values from Wilcoxon’s rank sum tests are shown above violins.

Validation of the array by examining the data obtained from antigens previously associated with a protective immune response. C. hominis Cp23 (Chro.40414) and Cp17, a conserved peptide encoded by the variable C. hominis gp60 gene (Chro.601380: variant IaA25R3), are both potential vaccine candidates and have been previously shown to be associated with a delay in reinfection in our study population (8, 10, 34). We investigated whether we could also detect an association with protection from reinfection between IgA antibodies recognizing C. hominis Cp23 (Chro.40414) and the C. hominis gp60 (Chro.601380) antigens on the Cryptosporidium array. As the IgA anti-Cp23 (Chro.40414) signal was low in our array data we were only able to analyze the anti-IgG Cp23 (Chro.40414) data. As expected, a protective association was observed between both anti-IgA and IgG C. hominis Gp60 (Chro.601380) (Figure 5, A–D) and anti-IgG C. hominis Cp23 (Chro.40414) (Figure 5, E and F). Since Gp60 and Cp23 were a priori antigen candidates, P values were not adjusted for the FDR.

Figure 5 Children with antibodies that targeted the C. hominis peptides encoded by the gp60 gene and Cp23 protein were associated with protection from reinfection. In the protein array data, IgA and IgG antibodies against the protein encoded by the C. hominis gp60 gene (Chro.60183) and IgG against Cp23 (Chro.40414) were associated with a delay in Cryptosporidium reinfection among children with a qPCR-verified Cryptosporidium infection during the first year of life (A, C, and E) or among all children in the study (B, D, and F). The X-axis shows days after the end of year 1 (when the assayed plasma samples were collected). The Y-axis shows the proportion of children who remained uninfected. Red lines represent children seronegative for the antigen, and blue lines represent seropositive children. The Kaplan-Meier curves show the probability of survival free of Cryptosporidium species, and the tables below the graphs indicate the number of children in the seropositive or seronegative categories at select time points. (A and B) IgG against Gp60 (Chro.60183). (C and D) IgA against Gp60 (Chro.60183). (E and F) IgG against Cp23 (Chro.40414). Hazard ratios (HR), confidence intervals, and P values were calculated using multivariable Cox proportional hazards models.

Impact of the polymorphisms in the gp60 gene on immune reactivity. The protein encoded by the gp60 gene is processed by the parasite into Gp40 and Gp15 proteins (Figure 6A). The region of the gp60 gene that encodes the Gp40 protein has 3 variable domains: a SNP-based allelic family “type”; a variable number of trinucleotide repeats “subtypes”; and a repeat sequence “R” (40). In the Bangladeshi infant cohort, 15 different variants of gp60 were identified (2 in C. parvum and 13 in C. hominis) (Figure 6A), all of which were included in the protein array (35). The gp60 genotype of the infecting Cryptosporidium parasite was known in a subset of cases, and the data from the plasma collected from these children was examined to see if an allele-specific immune signal could be observed. With 1 exception, the infecting C. hominis genotype matched the Gp40 antigen variant recognized by the child’s plasma (Figure 6, B and C, and Supplemental Figure 7). The number of trinucleotide repeats in the gp60 gene, however, did not impact antibody recognition: children infected with the IaA18R3 subtype bound equivalently to the Ia antigens on the array that had different numbers of trinucleotide repeats, including IaA27R3, IaA26R3, IaA25R3, IaA22R3, IaA19R3, and IaA18R3 (Figure 6, A and B). We concluded that humoral immunity to the variable Gp40 antigen was genotype-specific.

Figure 6 gp60 Genotype immune response. (A) Cartoon illustrating the proteins encoded by the gp60 gene. (B and C) Heat maps showing the intensity and breadth of the IgA (B) and IgG (C) antibody responses to the polymorphic region of the Gp40 protein. The different alleles of the peptide encoded by the gp60 allele (columns) and the signal obtained when using the plasma with antibodies raised in response to infection of parasite with different gp60 genotypes (rows). Lines at the top of each heat map indicate the protein type and on the side the genotype of the infecting parasite. Parasite genotypes: rows 1–8: IaA18R3; 9: IaA19R3; 10–16: IaA25R3; 17–20: IbA9G3R2; 21–29: IdA15G1; 30: IfA13G1; 31–34: C. parvum IIdA15G1R1. Protein alleles: columns A: IaA27R3, B: IaA26R3, C: IaA25R3, D: IaA22R3, E: IaA19R3 F: IaA18R3, G: IbA9G3R2, H: IdA14G1, I: IdA15G1, J: IeA11G3T3, K: IfA13G1, L: IfA16G1, M: IIcA5G3a N: IIdA13G1. Side panels show the intensity scale for the amount of antibody binding to alleles expressed by IVTT and spotted on the array. Antibody binding to the purified recombinant relatively conserved Cp17 peptide was included on the array as a positive control. Its signal intensity was higher than that of the IVTT values.

Antigens associated with protection from reinfection. We tested if a delay in the time to reinfection was associated with the development of anti-Cryptosporidium antibodies against specific antigens. This analysis was done for the children on follow up to ages 2 and 3 years who had qPCR-verified cryptosporidiosis during their first year of life (Table 1 and Supplemental Figure 8). The analysis was also performed including all the children in the cohort (Supplemental Table 2 and Supplemental Figure 9). To minimize false discoveries as well as false exclusions, a feature selection antigen filtering step was employed using random forest (RF) models on survival data. For the RF models, the children were stratified into seropositive versus seronegative for each of the 233 IgA- and/or IgG-reactive antigens, and variables that were important to the models over 100 iterations were identified (Figure 7, A and B).

Figure 7 RF analysis for selection of important antigens and analysis of risk during the first year after sampling. (A) The scatter plot represents antigens and clinical variables ranked by VIMP scores in RF using 1,000 trees constructed per model. Models were fit to survival data during one year of follow up after sampling on seropositive and seronegative children that all previously had qPCR-confirmed Cryptosporidium infections. Models using the entire cohort of children and 2-year follow-up periods are shown in Supplemental Figure 8. Each model was repeated 100 times, and the VIMP score was averaged across all runs (Y-axis). For each antigen, the percentage of runs where VIMP was greater than 0 (i.e., important to the model) was calculated (X-axis). The red horizontal dashed lines represent the mean of all VIMP scores plus 1 SD. The vertical dashed red lines represent antigens with at least 80% positive VIMP scores. The upper right quadrant shows the antigens selected as important variables in the model. (B) The horizontal bar plot represents VIMP scores for each antigen with at least 80% positive VIMP scores. The vertical red dashed line represents the cutoff for selection of important variables (equivalent to the horizontal lines in A). HRs calculated in the survival analysis were shown as protective (HR < 1, teal) or not (HR > 1, magenta). (C) Only protective antigens with at least 80% positive VIMP scores and VIMP scores above the importance cutoff were selected for individual antigen analysis. (D–G) The Kaplan Meier plots represent the 2 most significant previously unknown antigens associated with protection in children with prior qPCR+ stool samples or all children, respectively, after feature selection using RF.

Table 1 Antibody responses associated with protection resulting from a Cryptosporidium infection

Among antigens with an average variable importance metric (VIMP) greater than 1 SD above the mean of all antigen VIMP scores and with positive VIMP scores (i.e., important to the model) in at least 80% of iterations, 7 antigens in addition to Gp60 and Cp23 had hazard ratios less than 1 (protective) in all 4 modeling groups (Figure 7C). Additional RF comparisons are shown in Supplemental Figure 8 and are included in Table 1 along with the adjustment for the number of antigens tested. Additional antigens associated with protection in survival analyses, but not by random forest, are shown in Supplemental Figure 12 and Supplemental Table 3). These were selected for evaluation in Cox proportional hazards models (Figure 7B) with adjustment for the FDR. In addition to the Gp60 and Cp23 antigens, a significant association with protection from cryptosporidiosis was observed for antibodies against the Gp900 mucin (cgd7_4020), the potential mucin CpMuc8 (cgd8_700), the putative metal ion transporter CpCorA (cgd2_1520), a small membrane protein (Chro.30111) (Figure 7D), the Gp900 mucin (cgd7_4020) (Figure 7E), the putative metal ion transporter CpCorA (cgd2_1520)(Figure 7F), the potential mucin CpMuc8 (cgd8_700)(Figure 7G), and the coiled coil domain protein CpCCDC (cgd8_830) (additional RF comparisons are shown in Supplemental Figure 8). Parenthetically, cgd8_830 seropositivity was associated with significantly lower incidence of infection, particularly during the first year of followup postsampling, but was found to be more abundant in children that ultimately were infected at the end of followup. Likewise, Chro.30111 antibody responses showed evidence of protection during followup and at the end of the first year of followup postsampling, but not at the end of 2 years of followup.

A PLS-DA regression model was then used to evaluate the relative contribution of the selected antigens in defining the latent components (“loading weights”) that maximize discrimination of children by infection status (Figure 8A). The endpoint metric was complete protection from reinfection associated with antibody levels. The analysis was performed at ages 2 (Figure 8B) and 3 (Figure 8C). The contribution of each antibody to the PLS-DA profile at ages 2 (Figure 8D) and 3 (Figure 8E) is also shown.