Age-dependent impact of the major common genetic risk factor for COVID-19 on severity and mortality

Background: There is considerable variability in COVID-19 outcomes amongst younger adults—and some of this variation may be due to genetic predisposition. We characterized the clinical implications of the major genetic risk factor for COVID-19 severity, and its age-dependent effect, using individual-level data in a large international multi-centre consortium. Method: The major common COVID-19 genetic risk factor is a chromosome 3 locus, tagged by the marker rs10490770. We combined individual level data for 13,424 COVID-19 positive patients (N=6,689 hospitalized) from 17 cohorts in nine countries to assess the association of this genetic marker with mortality, COVID-19-related complications and laboratory values. We next examined if the magnitude of these associations varied by age and were independent from known clinical COVID-19 risk factors. Findings: We found that rs10490770 risk allele carriers experienced an increased risk of all-cause mortality (hazard ratio [HR] 1·4, 95% confidence interval [CI] 1·2–1·6) and COVID-19 related mortality (HR 1·5, 95%CI 1·3–1·8). Risk allele carriers had increased odds of several COVID-19 complications: severe respiratory failure (odds ratio [OR] 2·0, 95%CI 1·6-2·6), venous thromboembolism (OR 1·7, 95%CI 1·2-2·4), and hepatic injury (OR 1·6, 95%CI 1·2-2·0). Risk allele carriers ≤ 60 years had higher odds of death or severe respiratory failure (OR 2·6, 95%CI 1·8-3·9) compared to those > 60 years OR 1·5 (95%CI 1·3-1·9, interaction p-value=0·04). Amongst individuals ≤ 60 years who died or experienced severe respiratory COVID-19 outcome, we found that 31·8% (95%CI 27·6-36·2) were risk variant carriers, compared to 13·9% (95%CI 12·6-15·2%) of those not experiencing these outcomes. Prediction of death or severe respiratory failure among those ≤ 60 years improved when including the risk allele (AUC 0·82 vs 0·84, p=0·016) and the prediction ability of rs10490770 risk allele was similar to, or better than, most established clinical risk factors. Interpretation: The major common COVID-19 risk locus on chromosome 3 is associated with increased risks of morbidity and mortality—and these are more pronounced amongst individuals ≤ 60 years. The effect on COVID-19 severity was similar to, or larger than most established risk factors, suggesting potential implications for clinical risk management. Funding: Funding was obtained by each of the participating cohorts individually.


Introduction
The COVID-19 pandemic has led to the death of millions of individuals and the largest economic contraction since the Great Depression 1 . The clinical outcomes of COVID-19 are remarkably variable, such that some individuals remain asymptomatic 2 , while others develop severe COVID-19 with systemic inflammation, respiratory failure or death. This variability in outcome creates difficulties in clinical management when estimating who is at risk of severe disease and may develop a need for intensive care. Furthermore, recent guidelines suggest risk stratification should be considered when deciding upon prophylactic treatment algorithm and priority for vaccination 3 .
Some of this variation in COVID-19 behavior has been attributed to risk factors such as age 4 , sex 4 , comorbidities 5 , socioeconomic factors 6 and genetic variants in the SARS-CoV-2 genome. 7 While the main risk factor for severe outcomes is age, which increases exponentially after age 60 5 , some younger individuals experience severe COVID-19 outcomes and death. The early onset of several common diseases such as breast cancers and myocardial infarction, is disproportionally influenced by human genetic factors [8][9][10] and this may also be the case for . Several studies have identified and replicated a major genetic risk locus for severe COVID-19 [11][12][13] in the human genome. This genetic risk locus harbors a cluster of genes on chromosome 3, in which the true causal variant is still unknown. The single nucleotide polymorphism (SNP) rs10490770 serves as a marker for this genetic risk (as well as other SNPs . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted March 12, 2021. ; https://doi.org/10.1101/2021.03.07.21252875 doi: medRxiv preprint 6 in linkage disequilibrium 14 ) and approximately 15% of individuals of European ancestry carry the C risk allele 15 . However, the clinical relevance of this locus, and its potential age-dependent impact, is unknown. We therefore assembled individual-level COVID-19 clinical and human genomic data in a large international consortium of 17 cohorts in nine countries (Belgium, Brazil, Canada, Germany, Italy, Norway, Spain, Sweden, and UK) to assess the relationship between the chromosome 3 genetic risk with COVID-19 severity, complications and mortality. We next tested the age-dependent effects of this locus on COVID-19 outcomes. Last, in order to assess the relative importance of this locus, we compared its ability to predict COVID-19 outcomes to that of other established clinical risk factors.

Study participants
We gathered clinical and genomic data from 13,424 COVID-19 cases (6,689 of whom were hospitalized) with genetic information available, harmonizing individual-level data from 17 studies.
COVID-19 cases were defined as individuals having at least one confirmed SARS-CoV-2 viral nucleic acid amplification test from relevant biologic fluids, or whose SARS-CoV-2 status was confirmed by ICD-10 codes, using codes U071 and/or U072. We combined data from . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted March 12, 2021. ; https://doi.org/10.1101/2021.03.07.21252875 doi: medRxiv preprint hospital-based studies which recruited participants after COVID-19 outbreak, and a population-based biobank in which recruitment was not dependent upon COVID-19 status.
Detailed information for each individual study is described in the online supplement.

Statistical analysis
In order to tag the chromosome 3 locus, we selected the SNP rs10490770, which was most significantly associated with hospitalization in the COVID-19 genome-wide association study (GWAS) from the COVID-19 Host Genetics Initiative, since this is the largest genome-wide association study meta-analysis of COVID-19 severity 13 (cases / controls = 12,888 / 1,295,966). Each participating study performed genotyping and imputation separately following a recommended quality control pipeline 16 Table 1, Supplementary Figure 1). To test the association between rs10490770 and all phenotypes above, we applied a dominant model by grouping participants into two groups according to their genotype at rs10490770 -C is the allele associated with COVID-19 severity; those with TC genotype or CC genotype were labeled as carriers and those with TT genotype were labeled as non-carriers. We chose this . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted March 12, 2021 model because it had the lowest Akaike Information Criterion (AIC), compared to additive and   recessive models (see the online supplement for detail, Supplementary Table 2), in a logistic regression for death or severe respiratory failure outcome (defined below). All analyses were performed separately for each ancestry group. Because the sample size in non-Europeans was limited, we reported the results from European descent as main analyses, but also reported the results from non-European ancestry individuals are in the supplement. All analyses were based on mixed-effects model adjusted for age, sex and the first five genetic principal components (PCs) as fixed effects and study groups were also included as random effects to account for the study variability. Five study groups, mostly reflecting the country of origin of the study, were created by combining small participating studies with few cases and controls to reduce the risk of collinearity (detail is described in the online supplement). We further estimated the frequency of rs10490770 risk allele carrier status from the population frequencies reported in external database (the Genome Aggregation Database v 3·1 [gnomAD 15 ]), assuming this variant follows Hardy-Weinberg equilibrium.

Association with mortality
The hazard ratio (HR) for all-cause mortality was estimated by Cox proportional hazard models using the "coxme v2·2-16" R package. Individuals entered the follow-up when diagnosed with COVID-19 or if a diagnosis date was missing, the date when they were hospitalized or when . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted March 12, 2021. ; https://doi.org/10.1101/2021.03.07.21252875 doi: medRxiv preprint their symptoms started. They were considered as an event at the date of death and censored at the last date of follow-up (details are described in the online supplement). We additionally performed competing risk analyses to estimate the sub-distribution hazard ratio for COVID-19 related mortality using the "cmprsk v2·2-10" R package, which accounts for the competing risk of non-COVID-19 related death: i.e. individuals who did not die of COVID-19 but died due to other causes (e.g. cancer). In the competing risk model, study groups were considered as fixed effects. Survival analyses were restricted to study participants with available follow-up and cause of death information (N=9,248). Cause of death was defined by doctor-diagnoses, medical chart reviews or ICD-10 codes (details are described in the online supplement).

Association with COVID-19 severity and complications
To understand the clinical implications of the chromosome 3 locus, we fit mixed-effects regression models to assess the association of rs10490770 risk allele [C] carrier status with three types of COVID-19 outcomes: COVID-19 severity, COVID-19 complications and laboratory values. To do so, we defined three COVID-19 severity outcomes, with appropriate control definitions amongst SARS-CoV-2 positive individuals. 1) hospitalization; 2) intensive care unit (ICU) admission and 3) death or severe respiratory failure. Hospitalization cases were COVID-19 cases admitted to the hospital, whereas controls were individuals who did not experience hospitalization. ICU cases were those COVID-19 cases admitted to the ICU and . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted March 12, 2021. ; https://doi.org/10.1101/2021.03.07.21252875 doi: medRxiv preprint controls were individuals who did not experience hospitalization. To assess potential selection bias, we also repeated the analyses using only individuals who were hospitalized. In these analyses, controls were defined as those who were hospitalized, but not admitted to the ICU.
Death or severe respiratory failure cases were defined as individuals who died or required respiratory support (intubation, continuous positive airway pressure, Bilevel Positive Airway Pressure, or continuous external negative pressure, Optiflow/high flow Positive End Expiratory Pressure Oxygen), had ICD-10 codes for acute respiratory distress syndrome (ARDS) or acute respiratory failure ("J80", "J9600","J9609","Z991"), or OPCS codes of the use of ventilator ("E851","E852"). Controls for the death or severe respiratory failure cases were defined as those requiring no oxygen therapy and who were alive.
We next defined five COVID-19 related complications, which were diagnosed at hospital.
Controls for severe respiratory failure were defined as those requiring no oxygen therapy and who were alive, whereas controls for other complications were defined as those who did not meet the corresponding case criteria and were alive.
Last we considered the laboratory values of complete blood count and biochemistry tests available at hospital (Supplementary Table 3). To test the association with the chromosome 3 locus we used the highest or lowest value recorded per individual [19][20][21][22][23] . We selected the lowest value for lymphocyte counts and otherwise highest value. This is because we were interested in using these laboratory values as a proxy of COVID-19 severity. Definitions and quality control of laboratory values and specific codes are described in the online supplement (Supplementary Figure 2).

Age-dependent associations with COVID-19 severity
We evaluated the age-dependent effects of the risk allele carrier status on COVID-19 three severity phenotypes by performing two sets of analyses: 1) linear regressions between age at . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted March 12, 2021. ; https://doi.org/10.1101/2021.03.07.21252875 doi: medRxiv preprint diagnosis and risk allele carrier status amongst severe cases, adjusting for the same covariates as the main analyses, and 2) adding a carrier status by age interaction term in the main regression models. Age was not dichotomized in these analyses. We also stratified participants by age ≤ 60 or >60 years and repeated the same logistic regressions, as well as we estimated the frequency of the risk allele carriers in the two age groups. We used 60 years as a cut-point for age-stratified analyses, because COVID-19 case fatality rates increased markedly after this age 24,25 .

Associations with COVID-19 severity stratified by established clinical risk factors
In order to compare the association of rs10490770 risk allele carrier status with other risk factors, we similarly stratified participants by BMI according to the Centre for Disease Control website 26 . All of the eight risk factors were defined by doctor-diagnoses, medical chart reviews or ICD-10 codes (details are described in the online supplement). We then tested the difference of the magnitude of the associations of the risk allele carrier status compared to the eight clinical risk factors. Clinical risk factors stratified analysis and prediction assessment (described below) were restricted to individuals with . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted March 12, 2021. ; https://doi.org/10.1101/2021.03.07.21252875 doi: medRxiv preprint complete information for demographics, clinical risk factors and rs10490770 genotype information (N=7,919). The majority of this subset were from UK Biobank (N=7,461), and only 50 individuals were included from the first discovery GWAS 11 .

Risk prediction compared to established clinical risk factors
To better understand the prediction improvement by adding of the chromosome 3 genetic risk in addition to the eight clinical risk factors, we performed multivariate regressions in individuals with complete information as described above (N=7,919). We evaluated whether the rs10490770 risk allele improved the risk prediction discrimination for severe COVID-19 outcomes by calculating the area under receiver operation curve (AUC) and the continuous net reclassification improvement (NRI) using "pROC v1·16·2" and "PredictABEL v1·2-4" R packages.

Meta-analyses
As secondary analyses, we meta-analyzed the results with non-European ancestries and two external cohorts for which we did not have access to individual-level data; FinnGen and Columbia University COVID- 19 Biobank (CUB). This resulted in a total study population of 14,620 individuals with COVID-19. An inverse-variance weighted meta-analyses were performed under a fixed effect and random effects models using the "meta v4·16-1" R package . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted March 12, 2021. ; https://doi.org/10.1101/2021.03.07.21252875 doi: medRxiv preprint when the appropriate phenotypes were available and case counts, control counts, and the rs10490770 risk allele carrier counts were larger than ten in each cohort.

Sensitivity analysis
Adjusting for participating studies may lead to reduced statistical power, given that some studies had only severe cases or had disproportional case-control ratio. To alleviate the collinearity issue, we grouped some small studies to account for study variability. This may not fully account for between study variability. Thus we performed two sets of sensitivity analyses where we included, 1) only five genetic PCs without including the study of origin as random or fixed effects, and 2) including all participating studies either as fixed or random effects. Next, we performed the same analyses using UK Biobank (UKB) to provide estimates which are more representative of general population, since this is not a COVID-19 specific cohort. We also tried binning by different cut-offs for age-stratified analyses. In order to understand if results could have been influenced by related individuals within the samples, we selected one individual from a pair of relatives with PI-HAT (proportion of identity by descent calculated by PLINK 27 ) >0·1875 (meaning between second and third-degree relatives) and repeated the main analyses.
Role of the funding source . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted March 12, 2021. ; https://doi.org/10.1101/2021.03.07.21252875 doi: medRxiv preprint The funding sources had no role in study design; in the collection, analysis, and interpretation of data; in the writing of the report; and in the decision to submit the paper for publication.

Study participants
We collected and harmonized individual-level data from 13,424 COVID-19 patients diagnosed with COVID-19 from February 5 th , 2020 to January 2 nd , 2021. Table 1   . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted March 12, 2021 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted March 12, 2021. ; https://doi.org/10.1101/2021.03.07.21252875 doi: medRxiv preprint ICU admission, we observed that risk allele carriers tend to be younger than non-carriers.
However, we did not detect a different effect in the association between rs10490770 risk allele carriers and these additional severity phenotypes amongst those who were ≤ 60 vs >60 years old. This could be attributed to the heterogeneity of the criteria of hospitalization or ICU admission, or case-control imbalance in some participating studies. is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted March 12, 2021. ; https://doi.org/10.1101/2021.03.07.21252875 doi: medRxiv preprint Consistent with the results from multivariate regression, adding rs10490770 genotype to non-genetic risk factors improved discrimination for death or severe respiratory failure amongst ≤ 60 years (AUC: 0·82 vs 0·84, p=0·016 and NRI 0·45, p=6·5x10 -8 , Table 3), and the performance of risk discrimination was similar to, or better than, most of established risk factors included in the study ( Figure 4B, Supplementary Table 9).

Meta-analyses
We meta-analyzed the European ancestry results presented above with those of non-European ancestry participants and two external cohorts. We confirmed similar effects in  Figure 7). Given the small sample size of non-European participants, we . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted March 12, 2021. ; https://doi.org/10.1101/2021.03.07.21252875 doi: medRxiv preprint lacked sufficient statistical power to investigate whether the association between rs10490770 risk allele carriers and COVID-19 outcomes was different when comparing individuals of non-European and European ancestry.

Sensitivity analysis
Last, we performed several sensitivities analyses to evaluate the robustness of our results. First, we removed the study variables from the covariates and instead included the top five PCs (Supplementary Table 10 -11). Second, we included participating studies themselves either as fixed or random effects (Supplementary Table 10-11). Third, we restricted to individuals of European descent from UKB, a cohort which was not developed to study COVID-19 and thus is less prone to selection bias. These UKB analyses generated similar results (Supplementary   Table 12). Fourth, we explored different cut-offs for age-stratified analyses (Supplementary   Table 13). Last, we excluded related individuals (Supplementary Table 14). All sensitivity analyses were consistent with the results from the main analyses.

Discussion
Combining individual-level data from 13,424 individuals ascertained for COVID-19 outcomes from 17 cohorts in nine countries, we found that the major genetic risk factor for severe COVID-19 on chromosome 3 was strongly associated with COVID-19 related mortality and . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. First, amongst those ≤ 60 years, the odds of death or severe respiratory failure increased 2·6-fold for risk allele carriers. We found that 32% of individuals ≤ 60 years who died, or experienced severe respiratory failure, were risk allele carriers, compared to 14% of individuals not requiring supplemental oxygen. Second amongst individuals who died, or experienced severe respiratory failure, risk allele carriers were on average 2·3 years younger than non-carriers. Last, the risk discrimination for death and severe respiratory COVID-19 provided by the risk allele was similar to, or larger than, established clinical risk factors in individuals ≤ 60 years. Other common diseases have also demonstrated larger effects of genetic risk factors at younger age 8,9 . Genetic . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted March 12, 2021. ; https://doi.org/10.1101/2021.03.07.21252875 doi: medRxiv preprint risk factors are often clinically valuable for risk stratification in younger age groups because the frequency of other established risk factors for COVID-19 are often reduced, while the frequency of the genetic variant remains high. Moreover, this specific variant is not associated with any known COVID-19 risk factor and therefore provide orthogonal information compared to existing risk assessment tools.
Our findings suggest potential implications for clinical risk assessments in three situations.
Currently, risk factors such as DM are clinically used in triage to decide if COVID-19 patients require further follow-up. Amongst individuals less than 60 years old, this genetic risk factor has considerably larger effect size and is more common than DM. This suggests that genotyping could help to identify individuals who are at risk for COVID-19 severe outcomes and death, allowing for more tailored treatment and clinical observation. Second, amongst very ill individuals less than 60 years, the genetic risk factor is quite common and may help to explain to patients and families why this individual has become severely ill, while others with the same clinical risk factor profile remain healthy. Last, since SARS-CoV-2 will become endemic in the human population, future public health strategies, including vaccines against novel variants of SARS-CoV-2, could be targeted to individuals at higher risk of severe outcomes. The major common genetic risk factor for severe COVID-19 could help to ensure individuals at highest risk are prioritized for vaccine programs, thus reducing the overall burden of the disease.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
There are other flanking genes; CCR1, CCR2 and CCR3 [32][33][34] , whose involvement in SARS-CoV-2 infection had been suggested and could explain the biology of the striking effect of this genetic risk. Many studies 12,29 had been trying to pinpoint a or a set of causal genes but the consensus had not been built to date.
This study has important limitations. Each cohort has its own selection bias and ascertainment bias. Several studies were enriched for severe patients, whereas UKB is a non-COVID-19 cohort, with evidence of healthy volunteer bias 35 . Nevertheless, it may be less prone to selection bias than the COVID-19 cohorts. Selection bias is inherent to most COVID-19 observational studies 36 and this influences the generalizability of the results outside the study populations. To mitigate against these potential issues, we combined data from observational studies with different ascertainment strategies, including national healthcare systems, studies that were established prior to the COVID-19 pandemic and thus recruitment was not dependent upon COVID-19 status and hospital-based studies. This allowed for an increased representation of individuals with severe COVID-19 outcomes. We also provide analyses . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
While we included information from participants who were of non-European ancestry, on-going efforts should enable larger sample sizes in these ancestries to better define the importance of the chromosome 3 risk locus in these contexts. This further emphasizes the importance of developing genomics-enabled studies in individuals of non-European ancestry.
In summary, the major genetic COVID-19 risk locus is common and has large effects on COVID-19 outcomes including mortality. These effects are age-dependent, such that the magnitude of risk increases in younger individuals. These findings suggest potential implications of genetic information in clinical risk management.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted March 12, 2021  is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint .

CC-BY 4.0 International license
It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint FinnGen was approved by HUS coordinating Ethics committee. The Columbia University Biobank was approved by the Columbia University IRB.

Data sharing
The harmonized individual-level data of some participating cohorts from Belgium Regarding the data from genetic modifiers for COVID-19 related illness (BelCovid_1), . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted March 12, 2021. ; individual level data were acquired and shared with FIMM during the sanitary crisis under an emergency consent and an ethical approval which were specific to this particular project and do not cover deposition to public repositories. Upon contact with Françoise Wilkin (Françoise.Wilkin@erasme.ulb.ac.be), Isabelle Migeotte (Isabelle.Migeotte@erasme.ulb.ac.be), or Guillaume Smits (Guillaume.Smits@erasme.ulb.ac.be), an institutional data transfer agreement can be established and data shared if the aims of data use are covered by ethical approval and patient consent. The procedure will involve an update to the ethical approval, as well as review by legal departments at both institutions and the process will typically take 2-4 months from initial contact.
Regarding the BoSCO study, individual-level genotype and clinical data for purpose of this study were shared with FIMM under a legal, bilateral agreement and were specific to this particular project. Current participant consents and privacy regulations prohibit deposition of individual level data to public repositories. Upon contact with Kerstin Ludwig (kerstin.ludwig@uni-bonn.de) or Markus M. Nöthen (markus.noethen@uni-bonn.de), an institutional data transfer agreement can be established and data shared if the aims of data use is covered by ethical approvals and patient consent. The procedure will involve review by legal departments at both institutions and the process will typically take about 2 months from initial contact.
The BQC19 is an Open Science biobank. Instructions on how to access data for individuals from the BQC19 at the Jewish General Hospital site are available here: https://www.mcgill.ca/genepi/mcg-covid-19-biobank. Instructions on how to access data from other sites of the BQC19 are available here: https://www.bqc19.ca/en/access-data-samples.
For the COMRI cohort, data protection legislation does not allow for deposition of individual level data in public repositories. Upon direct contact with Prof Ulrike Protzer (protzer@tum.de, genetic data) and Dr Christoph Spinner (christoph.spinner@tum.de), an institutional data transfer agreement can be established and data will be shared if the aims of data use are covered by ethical approvals and patient consent. The procedure will involve an update to the ethical approval as well as review by legal departments at both institutions and the process will typically take 2-3 months from initial contact.
Regarding the Fondazione IRCCS Milan data (FOGS study), institutional data privacy regulations prohibit deposition of individual level data to public repositories without a specific consent. Participant written consent also does not cover public sharing of data for use for unknown purposes. Upon contact with professor Luca Valenti (luca.valenti@unimi.it) an institutional data transfer agreement can be established and data shared if the aims of data use are covered by ethical approvals and patient consent. The procedure will involve the request for an amendment to the ethical approvals, as well as review by legal departments at both institutions and the process will typically take 1-2 months from initial contact.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted March 12, 2021. ; https://doi.org/10.1101/2021.03.07.21252875 doi: medRxiv preprint Regarding Norwegian data (NorCoV2), institutional data privacy regulations prohibit deposition of individual level data to public repositories. Participant written consent also does not cover public sharing of data for use for unknown purposes. Upon contact with professor Tom H Karlsen (t.h.karlsen@medisin.uio.no) or professor Johannes R. Hov (j.e.r.hov@medisin.uio.no) an institutional data transfer agreement can be established and data shared if the aims of data use is covered by ethical approvals and patient consent. The procedure will involve an update to the ethical approvals, as well as review by legal departments at both institutions and the process will typically take 1-2 months from initial contact.
The genetic and phenotype datasets from UK Biobank are available via the UK Biobank data access process (see http://www.ukbiobank.ac.uk/register-apply/).

Code availability
All code for data management and analysis is archived online at https://github.com/tomoconaka/COVID19-chr3 for review and reuse.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted March 12, 2021. ; https://doi.org/10.1101/2021.03.07.21252875 doi: medRxiv preprint