H  3  M  2 : detection of runs of homozygosity from whole-exome sequencing data

A Magi, L Tattini, F Palombo, M Benelli… - …, 2014 - academic.oup.com
A Magi, L Tattini, F Palombo, M Benelli, A Gialluisi, B Giusti, R Abbate, M Seri, GF Gensini…
Bioinformatics, 2014academic.oup.com
Motivation: Runs of homozygosity (ROH) are sizable chromosomal stretches of homozygous
genotypes, ranging in length from tens of kilobases to megabases. ROHs can be relevant for
population and medical genetics, playing a role in predisposition to both rare and common
disorders. ROHs are commonly detected by single nucleotide polymorphism (SNP)
microarrays, but attempts have been made to use whole-exome sequencing (WES) data.
Currently available methods developed for the analysis of uniformly spaced SNP-array …
Abstract
Motivation: Runs of homozygosity (ROH) are sizable chromosomal stretches of homozygous genotypes, ranging in length from tens of kilobases to megabases. ROHs can be relevant for population and medical genetics, playing a role in predisposition to both rare and common disorders. ROHs are commonly detected by single nucleotide polymorphism (SNP) microarrays, but attempts have been made to use whole-exome sequencing (WES) data. Currently available methods developed for the analysis of uniformly spaced SNP-array maps do not fit easily to the analysis of the sparse and non-uniform distribution of the WES target design.
Results: To meet the need of an approach specifically tailored to WES data, we developed , an original algorithm based on heterogeneous hidden Markov model that incorporates inter-marker distances to detect ROH from WES data. We evaluated the performance of to correctly identify ROHs on synthetic chromosomes and examined its accuracy in detecting ROHs of different length (short, medium and long) from real 1000 genomes project data. turned out to be more accurate than GERMLINE and PLINK, two state-of-the-art algorithms, especially in the detection of short and medium ROHs.
Availability and implementation : is a collection of bash, R and Fortran scripts and codes and is freely available at https://sourceforge.net/projects/h3m2/ .
Contact : albertomagi@gmail.com
Supplementary information : Supplementary data are available at Bioinformatics online.
Oxford University Press