Base-calling of automated sequencer traces using phred. II. Error probabilities

B Ewing, P Green - Genome research, 1998 - genome.cshlp.org
B Ewing, P Green
Genome research, 1998genome.cshlp.org
Elimination of the data processing bottleneck in high-throughput sequencing will require
both improved accuracy of data processing software and reliable measures of that accuracy.
We have developed and implemented in our base-calling program phred the ability to
estimate a probability of error for each base-call, as a function of certain parameters
computed from the trace data. These error probabilities are shown here to be valid
(correspond to actual error rates) and to have high power to discriminate correct base-calls …
Elimination of the data processing bottleneck in high-throughput sequencing will require both improved accuracy of data processing software and reliable measures of that accuracy. We have developed and implemented in our base-calling program phred the ability to estimate a probability of error for each base-call, as a function of certain parameters computed from the trace data. These error probabilities are shown here to be valid (correspond to actual error rates) and to have high power to discriminate correct base-calls from incorrect ones, for read data collected under several different chemistries and electrophoretic conditions. They play a critical role in our assembly program phrap and our finishing programconsed.
genome.cshlp.org