BAMixChecker: an automated checkup tool for matched sample pairs in NGS cohort

H Chun, S Kim - Bioinformatics, 2019 - academic.oup.com
H Chun, S Kim
Bioinformatics, 2019academic.oup.com
Mislabeling in the process of next generation sequencing is a frequent problem that can
cause an entire genomic analysis to fail, and a regular cohort-level checkup is needed to
ensure that it has not occurred. We developed a new, automated tool (BAMixChecker) that
accurately detects sample mismatches from a given BAM file cohort with minimal user
intervention. BAMixChecker uses a flexible, data-specific set of single-nucleotide
polymorphisms and detects orphan (unpaired) and swapped (mispaired) samples based on …
Summary
Mislabeling in the process of next generation sequencing is a frequent problem that can cause an entire genomic analysis to fail, and a regular cohort-level checkup is needed to ensure that it has not occurred. We developed a new, automated tool (BAMixChecker) that accurately detects sample mismatches from a given BAM file cohort with minimal user intervention. BAMixChecker uses a flexible, data-specific set of single-nucleotide polymorphisms and detects orphan (unpaired) and swapped (mispaired) samples based on genotype-concordance score and entropy-based file name analysis. BAMixChecker shows ∼100% accuracy in real WES, RNA-Seq and targeted sequencing data cohorts, even for small panels (<50 genes). BAMixChecker provides an HTML-style report that graphically outlines the sample matching status in tables and heatmaps, with which users can quickly inspect any mismatch events.
Availability and implementation
BAMixChecker is available at https://github.com/heinc1010/BAMixChecker
Supplementary information
Supplementary data are available at Bioinformatics online.
Oxford University Press