Characteristics of the study cohort. Table 1 shows the characteristics of the ARIA study (32) participants with asthma examined in this study. These 151 children with mild to severe persistent asthma were recruited from the Mount Sinai Health System, New York, New York, USA, with informed consent from their parents/guardians via an IRB-approved protocol. Participants had a mean age of 12 years (standard deviation 3.2 years) at the time of assessment and were of diverse self-identified racial and ethnic backgrounds (Table 1). Their asthma was generally not well controlled, with a mean score on the ACT (34) of 16.8 (maximum value 25 representing optimal control) and 96% of the cohort reporting regular use of a short-acting β-agonist rescue inhaler.

Table 1 Characteristics of the ARIA cohort participants included in this study

Children who used daily asthma controller medication (n = 84, 56%) were younger than those who did not (n = 65; P = 0.048). ICSs were used most frequently, both independently and in combination with LABAs. Children who had at least 1 lifetime emergency room visit for asthma (n = 103, 68%) were more likely to self-identify as Black or Latino, had lower (P = 0.03) ACT scores than their counterparts who had never required an emergency department visit for asthma (P = 9.54 × 10–3), and were more likely to be taking combination ICS/LABA as their daily asthma controller medication (P = 5.97 × 10–3). Children who had been hospitalized overnight for asthma in their lifetime (n = 51, 34%) had significantly lower FEV1% on spirometry (P = 0.04), and higher rates of ICS/LABA (P = 2.98 × 10–7) and leukotriene receptor antagonist (P = 3.08 × 10–4) use for daily asthma treatment, compared with the participants with asthma who had never been hospitalized overnight for asthma.

Air toxic characteristics. Ambient annual average concentrations for over a hundred air toxics based on emissions inventories and computer simulation models are publicly available for each US census tract in the EPA’s NATA database (31). We mapped the available toxic levels to the residential zip code for each child in our cohort. Ninety-four zip codes spanning 443 square miles across New York, New Jersey, and Connecticut were represented in this cohort. We used the closest calendar year of NATA data available subsequent to a child’s birth date. We retained only the air toxics whose levels were available for all the participants in the mapped data sets, yielding 125 air toxics for analysis.

DEEP-enabled identification of combinations of air toxics associated with childhood asthma. We then applied DEEP to identify air toxic combinations associated with each of the 3 childhood asthma outcomes, namely the need for daily asthma controller medication, lifetime emergency room visit for asthma, and lifetime overnight hospitalization for asthma. In the first analytical stage of DEEP (detailed in Methods), for each outcome, the full data set was randomly split 100 times into training and test sets in an 80:20 ratio. For each split, an XGBoost model consisting of 100 decision trees was learned from the training set and evaluated on the test set in terms of the area under the receiver operating characteristic (ROC) curve (AUC score; ref. 35).

In the second analytical stage of DEEP, we analyzed the combinations of toxics from the XGBoost models, identified as root-to-leaf paths in the constituent decision trees, for each outcome. Note that in some cases, a combination may consist of only 1 air toxic if it is sufficient to predict the outcome under consideration for a subset of the cohort, thus giving DEEP flexibility in discovery. Also, in cases of multiple air toxics in these combinations, their sequence of appearance on the path also indicates their relative order of relevance to the outcome being predicted. This is because variables closer to the root of a decision tree have higher predictive power than those closer to the leaves.

Next, the frequency of each combination was calculated as the number of models (out of 100) where the combination was included in at least 1 of the constituent trees. Candidate combinations were then identified as those with a frequency of at least 10. These combinations were then used in multivariable regression models to test their association with the asthma outcome of interest, while adjusting for age, sex, race and ethnicity, and income.

After the first XGBoost stage of DEEP, 689 profiles of air toxics across all the asthma outcomes were discovered. These sets included both individual air toxics and their combinations. In the second stage of DEEP, 359 of these sets were then found to be significantly associated (P ≤ 0.05) with the respective outcome. After multiple-hypothesis correction by the Benjamini-Hochberg procedure (36), 273 air toxic profiles were found to be significantly associated (FDR ≤ 0.05) with at least 1 of the 3 outcomes. Our goal was to identify air toxic combinations whose increased levels are associated with adverse asthma outcomes. Therefore, among the significantly associated combinations, we focused on groups that included air toxics with levels higher than threshold. Among these final determined combinations, 18 had 1 air toxic each (Figure 2), and 20 were multi–air toxic combinations (Figure 3).

Figure 2 Air toxics individually associated with childhood asthma outcomes after adjustment for age, sex, race and ethnicity, and family income in ARIA cohort participants with persistent asthma (n = 151). For each outcome and air toxic, the strength of the association is shown in terms of its odds ratio (OR), 95% confidence interval (CI), and false discovery rate (FDR). P values for individual air toxics were obtained from multivariable logistic regression models and then adjusted for multiple hypothesis testing using the Benjamini-Hochberg procedure, yielding FDR values.

Figure 3 Multi–air toxic combinations associated with childhood asthma outcomes after adjustment for age, sex, race and ethnicity, and family income in ARIA cohort participants with persistent asthma (n = 151). For each outcome and combination, the strength of the association is shown in terms of its odds ratio (OR), 95% confidence interval (CI), and false discovery rate (FDR). P values for multi–air toxic combinations were obtained from multivariable logistic regression models and then adjusted for multiple hypothesis testing using the Benjamini-Hochberg procedure, yielding FDR values.

Air toxic combinations associated with asthma outcomes. Twenty multi–air toxic combinations and 18 individual air toxics were found to be significantly associated with at least 1 of the 3 asthma outcomes. The medians and interquartile ranges (IQRs) of the exposure levels of the 34 air toxics included in these associations are shown in Table 2.

Table 2 Air toxics identified by DEEP as significantly associated with at least 1 of the 3 asthma outcomes, either individually or in combination with other air toxics

Higher levels of 17 individual air toxics were significantly associated with worse asthma outcomes (Figure 2). ORs for these associations ranged from 1.56 to 2.65. Several of the identified toxics are established risk factors for childhood asthma, especially the chemicals previously categorized as halogenated, ketones, and ethers (8, 37–39). Among these, the air toxics most associated with the outcomes were acrylic acid (OR = 2.10), mercury compounds (OR = 2.65), and ethyl chloride (OR = 1.87), respectively. Acetamide, pentachlorophenol, and polychlorinated biphenyls were associated with more than 1 asthma outcome.

A major strength of DEEP is its ability to identify multi–air toxic combinations associated with health outcomes. Indeed, here DEEP revealed significant associations between higher exposure to 20 multi–air toxic combinations and the 3 asthma outcomes of interest (Figure 3). Among these, 19 combinations included 2 air toxics and 1 included 3. The associations of these combinations were generally stronger than those of the individual air toxics, with ORs ranging from 1.60 to 3.19 (Figure 3).

Notably, acrylic acid not only was the individual air toxic most strongly associated with daily controller medication (Figure 2) but also was the first (i.e., primary) member of 7 of the 9 multi–air toxic combinations associated with this outcome (Figure 3). Acrylic acid also appeared in 3 of the other 11 combinations associated with emergency room visit and overnight hospitalization for asthma (Figure 3), indicating that it is a major contributor to adverse asthma outcomes among children.

Three air toxic combinations were associated with lifetime emergency room visit for asthma, all with an OR of over 2 (Figure 3). Acetaldehyde, acrylamide, and acrylic acid were the primary exposures in these combinations, despite the fact they were not individually significantly associated with the outcome. Several other air toxics in these combinations, namely carbon disulfide and hydroquinone, were also not individually associated with this outcome. These findings highlight the main strength of DEEP, namely its ability to identify significant multi–air toxic combinations, whose constituent air toxics may not be individually associated with the health outcome of interest.

Among the 8 air toxic combinations associated with lifetime overnight hospitalization for asthma, 1,4-dioxane, carbonyl sulfide, ethylidene dichloride, hydrochloric acid, and hydroquinone were the primary exposures (Figure 3). Both ethylidene dichloride and hydroquinone appeared in 3 of these 8 combinations, indicating that these 2 chemicals may play a role in the development of poor asthma outcomes among children. Most other air toxics in these combinations (Figure 3) were largely not individually associated with this outcome (Figure 2), again supporting DEEP’s ability to identify multi–air toxic combinations that may not be inferred from single air toxic associations.

Effect sizes of multi–air toxic combinations may not be evident from the individual associations of their members. Some air toxics had relatively low effect sizes when assessed individually (Figure 2) compared with the larger ORs from combination analyses (Figure 3). For example, acrylic acid was associated with daily controller medication, with an OR of 2.10 as an individual air toxic (Figure 2), but the ORs of its combinations with dimethyl phthalate, 1,1,1-trichloroethane, ethyl chloride, acetophenone, and cobalt were higher (OR 2.16 to 3.19; Figure 3). Also, none of these 5 air toxics was individually associated with the outcome. Similarly, hexachlorobenzene was associated with daily controller medication with an OR of 2.03 (Figure 2), while simultaneous exposure to the combination of hexachlorobenzene and dimethyl phthalate identified by DEEP had an OR of 2.96 (Figure 3). This was despite the fact that there was no significant individual association between dimethyl phthalate and the outcome. For the pair of toluene and phosphorus, neither air toxic was individually associated with daily controller medication (Figure 2), but their combination was associated with the outcome with an OR of 1.81 (Figure 3).

Similar cases of combinatorial effects were also seen for lifetime emergency room visit for asthma. For example, simultaneous exposure to polychlorinated biphenyl, acetaldehyde, and carbon disulfide had 3.10-fold higher odds of the outcome (Figure 3), while polychlorinated biphenyl’s individual effect size was substantially lower (OR = 1.72; Figure 2). Similarly, the combination of acrylic acid and hydroquinone was significantly associated with emergency room visit with an OR of 2.73 (Figure 3), but neither was associated with the outcome individually (Figure 2).

We observed similar results for multi–air toxic combinations and lifetime overnight hospitalization for asthma. Exposure to hydroquinone was individually associated with this outcome with an OR of 1.79 (Figure 2), but in combination with ethylidene dichloride, the association was stronger (OR = 2.03; Figure 3). Similarly, carbonyl sulfide was not individually associated with this outcome (Figure 2), but it was the primary member in 2 of the multi–air toxic combinations found to be associated with overnight hospitalization (Figure 3).

In summary, the above comparison of the effect sizes of the individual air toxic (Figure 2) and multi–air toxic (Figure 3) associations demonstrated that combinations of air toxics had effects that were not fully explained by simply adding together the individual effects from their constituents. Overall, DEEP identified 34 air toxics associated with the asthma outcomes (Table 2), including 16 air toxics with significant effects only as members of combinations.

Statistical interactions among members of air toxic combinations. To assess potential synergy between members of air toxic combinations associated with asthma outcomes, we conducted statistical tests for interactions. Significant statistical interactions detected between air toxic members within the combinations are shown in Table 3. Acrylic acid was the primary air toxic (i.e., primary branch point in the decision tree) of all the combinations with significant statistical interactions. Although other combinations did not reveal significant interactions, such interactions remain possible given the limitations of statistical detection of interactions. Directed experimental work could be undertaken to test for additional interactions.

Table 3 Air toxic combinations associated with asthma outcomes with statistically significant interactions between combination members

Representative air toxic combinations and demographic risk factors. Finally, one of the advantages of DEEP is that the trees constituting its underlying XGBoost models can be visualized and interpreted, which is difficult to do for several other machine learning methods. However, since it is difficult to simultaneously depict all the trees inferred by DEEP, we visualized sample trees that contained the most strongly associated multi–air toxic combination for each childhood asthma outcome. Sample decision trees inferred by DEEP for each of the outcomes are shown in Figures 4, 5, and 6, respectively. To provide an additional level of interpretation, we also compared the demographic characteristics (age, sex, race and ethnicity, and family income) of children exposed to each of these combinations with those of children who were not exposed (Tables 4, 5, and 6). Differences could suggest demographic risk factors that may increase a child’s exposure to these multi–air toxic combinations.

Figure 4 A sample decision tree learned by DEEP to predict daily asthma controller medication using NATA-derived air toxic data geocoded to patients (n = 149). Each node in the tree indicates the number of participants satisfying the air toxic decision path until that point and the percentage of participants with that outcome. The sample corresponding to each node is stratified into 2 subpopulations based on the air toxic and its threshold associated with the node. The multi–air toxic combination acrylic acid and cobalt compounds, which was most significantly associated with this outcome, is highlighted in red.

Figure 5 A sample decision tree learned by DEEP to predict lifetime emergency room visit for asthma from NATA-derived air toxic exposure data geocoded to each patient (n = 151). Each node in the tree indicates the number of participants satisfying the air toxic decision path until that point and the percentage of participants with that outcome. The sample corresponding to each node is stratified into 2 subpopulations based on the air toxic and its threshold associated with the node. The multi–air toxic combination acetaldehyde and carbon disulfide and polychlorinated biphenyls, which was most significantly associated with this outcome, is highlighted in red.

Figure 6 A sample decision tree learned by DEEP to predict lifetime overnight hospitalization for asthma from NATA-derived air toxic data geocoded to each participant (n = 151). Each node in the tree indicates the number of participants satisfying the air toxic decision path until that point and the percentage of patients with that outcome. The sample corresponding to each node is stratified into 2 subpopulations based on the air toxic and its threshold associated with the node. The multi–air toxic combination hydroquinone and ethylidene dichloride, which was most significantly associated with this outcome, is highlighted in red.

Table 4 Demographic characteristics of children exposed and not exposed to the acrylic acid and cobalt compounds combination, which was associated with daily asthma controller medication

Table 5 Demographic characteristics of children exposed and not exposed to the acetaldehyde and carbon disulfide and polychlorinated biphenyls combination, which was associated with lifetime emergency room visit for asthma

Table 6 Demographic characteristics of children exposed and not exposed to the hydroquinone and ethylidene dichloride combination, which was associated with lifetime overnight hospitalization for asthma

Acrylic acid and cobalt compounds was the air toxic combination associated with daily controller medication use with the highest OR of 3.19 (Figures 3 and 4). Children exposed to this combination were older than those who were not exposed (P = 0.02; Table 4).

Acetaldehyde, carbon disulphide, and polychlorinated biphenyls was the air toxic combination most strongly associated with lifetime emergency room visit for asthma (OR = 3.10; Figure 3 and Figure 5). Children exposed to this combination were younger (P = 5.34 × 10–8; Table 5) and had lower family income than those who were not exposed (P = 0.019; Table 5). Exposed children were also less likely to be White (P = 0.0046; Table 5). These observations point to social disparities among these groups of children.

The most strongly associated combination for overnight hospitalization was hydroquinone and ethylidene dichloride (OR = 2.03; Figure 3 and Figure 6). Children exposed to this combination were younger (P = 0.00218; Table 6) and had lower family incomes (P = 8.26 × 10–5; Table 6) than those who were not exposed.