Repurposing large health insurance claims data to estimate genetic and environmental contributions in 560 phenotypes

CM Lakhani, BT Tierney, AK Manrai, J Yang… - Nature …, 2019 - nature.com
Nature genetics, 2019nature.com
We analysed a large health insurance dataset to assess the genetic and environmental
contributions of 560 disease-related phenotypes in 56,396 twin pairs and 724,513 sibling
pairs out of 44,859,462 individuals that live in the United States. We estimated the
contribution of environmental risk factors (socioeconomic status (SES), air pollution and
climate) in each phenotype. Mean heritability (h 2= 0.311) and shared environmental
variance (c 2= 0.088) were higher than variance attributed to specific environmental factors …
Abstract
We analysed a large health insurance dataset to assess the genetic and environmental contributions of 560 disease-related phenotypes in 56,396 twin pairs and 724,513 sibling pairs out of 44,859,462 individuals that live in the United States. We estimated the contribution of environmental risk factors (socioeconomic status (SES), air pollution and climate) in each phenotype. Mean heritability (h2 = 0.311) and shared environmental variance (c2 = 0.088) were higher than variance attributed to specific environmental factors such as zip-code-level SES (varSES = 0.002), daily air quality (varAQI = 0.0004), and average temperature (vartemp = 0.001) overall, as well as for individual phenotypes. We found significant heritability and shared environment for a number of comorbidities (h2 = 0.433, c2 = 0.241) and average monthly cost (h2 = 0.290, c2 = 0.302). All results are available using our Claims Analysis of Twin Correlation and Heritability (CaTCH) web application.
nature.com